macos-local-voiceLocal STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection.
Install via ClawdBot CLI:
clawdbot install STRRL/macos-local-voiceFully local speech-to-text (STT) and text-to-speech (TTS) on macOS. No API keys, no network, no cloud. All processing happens on-device.
yap CLI in PATH β install via brew install finnvoor/tools/yapffmpeg in PATH (optional, needed for ogg/opus output) β brew install ffmpegsay and osascript are macOS built-inTranscribe an audio file to text using Apple's on-device speech recognition.
node {baseDir}/scripts/stt.mjs <audio_file> [locale]
audio_file: path to audio (ogg, m4a, mp3, wav, etc.)locale: optional, e.g. zh_CN, en_US, ja_JP. If omitted, uses system default.Use node {baseDir}/scripts/stt.mjs --locales to list all supported locales.
Key locales: en_US, en_GB, zh_CN, zh_TW, zh_HK, ja_JP, ko_KR, fr_FR, de_DE, es_ES, pt_BR, ru_RU, vi_VN, th_TH.
zh_CNen_USConvert text to an audio file using macOS native TTS.
node {baseDir}/scripts/tts.mjs "<text>" [voice_name] [output_path]
text: the text to speakvoice_name: optional, e.g. Yue (Premium), Tingting, Ava (Premium). If omitted, auto-selects the best available voice based on text language.output_path: optional, defaults to a timestamped file in ~/.openclaw/media/outbound/ffmpeg is available, output is ogg/opus (ideal for messaging platforms). Otherwise aiff.After generating the audio file, send it using the message tool:
message action=send media=<path_from_tts.sh> asVoice=true
List available voices, check readiness, or find the best voice for a language:
node {baseDir}/scripts/voices.mjs list [locale] # List voices, optionally filter by locale
node {baseDir}/scripts/voices.mjs check "<name>" # Check if a specific voice is downloaded and ready
node {baseDir}/scripts/voices.mjs best <locale> # Get the highest quality voice for a locale
Tell the user: "Voice X is not downloaded. Go to System Settings β Accessibility β Spoken Content β System Voice β Manage Voices to download it."
say command silently falls back to a default voice if the requested voice is not available (exit code 0, no error). Always use voices.mjs check before calling tts.mjs with a specific voice name.Yue (Premium), Ava (Premium)) sound significantly better but must be manually downloaded by the user.Generated Mar 1, 2026
Enables local, offline voice interactions for customer support bots on macOS, handling inquiries in languages like English, Chinese, or Japanese without cloud dependencies. Ideal for privacy-sensitive industries where data must stay on-device, reducing latency and API costs.
Provides text-to-speech and speech-to-text capabilities for educational apps on macOS, assisting students with disabilities by converting study materials to audio or transcribing lectures locally. Supports multiple languages to cater to diverse learners in offline environments.
Facilitates voiceover generation and transcription for media producers on macOS, allowing creators to dub videos or transcribe interviews in languages such as Spanish or French without internet access. Uses high-quality premium voices for professional audio output.
Assists healthcare professionals on macOS by transcribing patient consultations locally to maintain confidentiality, with support for medical terminology in languages like German or Russian. Integrates with voice notes for secure, offline record-keeping without cloud risks.
Powers offline voice assistants for travel apps on macOS, enabling tourists to get spoken translations or transcribe local conversations in languages such as Thai or Vietnamese. Leverages smart voice selection for natural interactions in diverse regions.
Offer the skill as a free component in macOS productivity tools, with premium features like advanced voice quality detection or custom voice packs available for purchase. Targets developers and businesses seeking to enhance apps with offline voice capabilities.
License the skill to enterprises for integration into internal systems like call centers or training platforms, providing local, secure voice processing without cloud dependencies. Includes support for multiple languages and compliance with data privacy regulations.
Sell the skill as part of a developer toolkit for macOS app creators, including documentation and support for implementing STT and TTS in applications. Appeals to indie developers and startups building voice-enabled tools for education or media.
π¬ Integration Tip
Ensure all required binaries like yap and ffmpeg are installed via Homebrew, and use the voices.mjs script to check voice availability before TTS calls to avoid silent fallbacks.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.