sagElevenLabs text-to-speech with mac-style say UX.
Install via ClawdBot CLI:
clawdbot install steipete/sagInstall sag (brew):
brew install steipete/tap/sagRequires:
Use sag for ElevenLabs TTS with local playback.
API key (required)
ELEVENLABS_API_KEY (preferred)SAG_API_KEY also supported by the CLIQuick start
sag "Hello there"sag speak -v "Roger" "Hello"sag voicessag prompting (model-specific tips)Model notes
eleven_v3 (expressive)eleven_multilingual_v2eleven_flash_v2_5Pronunciation + delivery rules
--normalize auto (or off if it harms names).--lang en|de|fr|... to guide normalization. not supported; use [pause], [short pause], [long pause]. supported; not exposed in sag.v3 audio tags (put at the entrance of a line)
[whispers], [shouts], [sings][laughs], [starts laughing], [sighs], [exhales][sarcastic], [curious], [excited], [crying], [mischievously]sag "[whispers] keep this quiet. [short pause] ok?"Voice defaults
ELEVENLABS_VOICE_ID or SAG_VOICE_IDConfirm voice + speaker before long output.
When Peter asks for a "voice" reply (e.g., "crazy scientist voice", "explain in voice"), generate audio and send it:
# Generate audio file
sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"
# Then include in reply:
# MEDIA:/tmp/voice-reply.mp3
Voice character tips:
[excited] tags, dramatic pauses [short pause], vary intensity[whispers] or slower pacing[sings] or [shouts] sparinglyDefault voice for Clawd: lj2rcrvANS3gaWWnczSX (or just -v Clawd)
Generated Mar 1, 2026
Producers use sag to generate expressive voiceovers for YouTube videos, podcasts, and social media clips. It supports audio tags like [whispers] or [excited] for dramatic effects, enhancing engagement without needing professional recording studios.
Educators integrate sag to create multilingual audio content for online courses, using stable models like eleven_multilingual_v2. It helps deliver clear pronunciations with language bias settings, making learning accessible across regions.
Businesses deploy sag to generate voice responses in chatbots or IVR systems, with fast models like eleven_flash_v2_5 for quick playback. Audio tags add emotional nuance, improving user experience in support interactions.
Developers build apps that convert text to speech using sag's local playback, offering customizable voices and pauses. It aids in reading digital content aloud, with normalization features for numbers and URLs to ensure clarity.
Game designers and writers use sag to produce character dialogues with specific voices, leveraging v3 audio tags for emotional depth. It allows real-time audio generation, enhancing immersive experiences in interactive media.
Offer tiered subscriptions providing access to sag's TTS features, with higher tiers including advanced models and voice customization. Revenue comes from monthly fees based on usage limits and premium support.
License sag to enterprises for embedding into their products, such as e-learning platforms or customer service tools. Revenue is generated through one-time licensing fees or annual contracts with technical support.
Provide a free version with basic TTS capabilities, while charging for advanced features like multilingual models, audio tags, and high-volume usage. Revenue streams from upgrades and in-app purchases.
💬 Integration Tip
Set the ELEVENLABS_API_KEY environment variable and install sag via brew for quick setup; use voice defaults like -v Clawd to streamline audio generation in applications.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.
Control Amazon Alexa devices and smart home via the `alexacli` CLI. Use when a user asks to speak/announce on Echo devices, control lights/thermostats/locks, send voice commands, or query Alexa.