sayText-to-Speech via macOS say command with Siri Natural Voices. Use for generating speech audio, TTS clips, or speaking text aloud on macOS.
Install via ClawdBot CLI:
clawdbot install tobihagemann/sayUse say for on-device text-to-speech on macOS.
Siri voices are the best macOS TTS voices but cannot be selected via -v. Instead, run say without -v β it uses the system default voice. Switch languages via defaults write:
# Switch to German
defaults write com.apple.speech.voice.prefs SystemTTSLanguage -string "de"
say "Hallo, wie geht's?" -o output_de.aiff
# Switch to Chinese (Mandarin)
defaults write com.apple.speech.voice.prefs SystemTTSLanguage -string "cmn"
say "δ½ ε₯½οΌδΈη" -o output_zh.aiff
No process restart needed β the next say invocation picks up the new language immediately.
Download the desired Siri voices first in System Settings > Accessibility > Spoken Content and set them as the system voice for each language.
Check which voices are currently configured:
defaults read com.apple.Accessibility SpokenContentDefaultVoiceSelectionsByLanguage
-vFor non-Siri voices, use -v directly:
say -v 'Tingting (Enhanced)' "δ½ ε₯½οΌδΈη"
say -v '?' # list all installed voices (Siri voices not listed)
say -o output.aiff "Hello world"
ffmpeg -y -i output.aiff -ar 22050 -ac 1 output.wav # convert to WAV
-v β Select a non-Siri voice-r β Speaking rate in words per minute (e.g. -r 150)-o β Save to AIFF file instead of playing aloudsay adds natural pauses at punctuation β no manual sentence splitting neededdefaults write callsGenerated Mar 1, 2026
Content creators use the skill to generate voiceovers for videos or podcasts in multiple languages by switching system TTS languages. It enables quick audio clip creation without manual recording, ideal for educational or marketing content.
Developers integrate the skill into apps to provide text-to-speech functionality for visually impaired users on macOS. It supports natural Siri voices for improved user experience in reading apps or screen readers.
Educators create audio materials for language courses by generating speech in target languages like German or Mandarin. The skill allows for easy pronunciation examples and listening exercises without external TTS services.
System administrators set up automated spoken notifications for monitoring tools or scripts on macOS servers. It converts text logs or alerts into speech for real-time auditory feedback in control rooms.
Designers and engineers use the skill to quickly prototype voice responses for chatbots or smart devices. It provides a low-cost way to test audio outputs before investing in commercial TTS APIs.
Offer voiceover creation for clients needing multilingual audio for videos or presentations. Use the skill to generate clips efficiently, then edit and deliver them for a fee, leveraging low operational costs.
Develop a platform that automates audio localization for businesses expanding globally. Integrate the skill to convert text into speech in various languages, charging per audio minute or through monthly plans.
Create and sell language learning apps or software that include TTS features powered by this skill. License the integrated solution to schools or online course providers, with revenue from one-time purchases or annual licenses.
π¬ Integration Tip
Ensure ffmpeg is installed for format conversion and set system TTS languages once per session to minimize defaults write calls for batch processing.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.