elevenlabsText-to-speech, sound effects, music generation, voice management, and quota checks via the ElevenLabs API. Use when generating audio with ElevenLabs or managing voices.
Install via ClawdBot CLI:
clawdbot install odrobnik/elevenlabsCore tools for interacting with the ElevenLabs API for sound generation, music, and voice management.
See SETUP.md for prerequisites and setup instructions.
| Model | ID | Use Case |
|-------|----|----------|
| Eleven v3 | eleven_v3 | ā Best for expressive/creative audio. Supports audio tags (square brackets): [laughs], [sighs], [whispers], [excited], [grumpy voice], [clears throat], etc. Use for storytelling, characters, demos. |
| Multilingual v2 | eleven_multilingual_v2 | Stable multilingual. No audio tags. Good for straightforward narration. |
| Turbo v2.5 | eleven_turbo_v2_5 | Low-latency, good for non-English (German TTS). Required for realtime/conversational. |
| Flash v2.5 | eleven_flash_v2_5 | Fastest, lowest cost. |
[laughs], [chuckles], [sighs], [clears throat], [whispers], [shouts]
[excited], [sad], [angry], [warmly], [deadpan], [sarcastic]
[grumpy voice], [philosophical], [whiny voice], [resigned]
[laughs hard], [sighs deeply], [pause]
Tags can be placed anywhere in text. Combine freely. v3 understands emotional context deeply.
All scripts support multiple output formats via --format:
| Format | Description |
|--------|-------------|
| mp3_44100_128 | MP3, 44.1kHz, 128kbps (default) |
| mp3_44100_192 | MP3, 44.1kHz, 192kbps |
| mp3_44100_96 | MP3, 44.1kHz, 96kbps |
| mp3_44100_64 | MP3, 44.1kHz, 64kbps |
| mp3_44100_32 | MP3, 44.1kHz, 32kbps |
| mp3_24000_48 | MP3, 24kHz, 48kbps |
| mp3_22050_32 | MP3, 22.05kHz, 32kbps |
| opus_48000_192 | Opus, 48kHz, 192kbps ā best for AirPlay |
| opus_48000_128 | Opus, 48kHz, 128kbps |
| opus_48000_96 | Opus, 48kHz, 96kbps |
| opus_48000_64 | Opus, 48kHz, 64kbps |
| opus_48000_32 | Opus, 48kHz, 32kbps |
| pcm_16000 | Raw PCM, 16kHz |
| pcm_22050 | Raw PCM, 22.05kHz |
| pcm_24000 | Raw PCM, 24kHz |
| alaw_8000 | A-law, 8kHz (telephony) |
speech.py)Text-to-speech using ElevenLabs voices.
# Basic usage
python3 {baseDir}/scripts/speech.py "Hello world" -v <voice_id> -o output.mp3
# With format option
python3 {baseDir}/scripts/speech.py "Hello world" -v <voice_id> -o output.pcm --format pcm_44100
# With voice settings
python3 {baseDir}/scripts/speech.py "Hello" -v <voice_id> -o out.mp3 --stability 0.7 --similarity 0.8
sfx.py)Generate sound effects and short audio clips.
# Generate a sound
python3 {baseDir}/scripts/sfx.py "Cinematic boom" -o boom.mp3
# Generate a loop
python3 {baseDir}/scripts/sfx.py "Lo-fi hip hop beat" --duration 10 --loop -o beat.mp3
# Different format
python3 {baseDir}/scripts/sfx.py "Whoosh" -o whoosh.pcm --format pcm_44100
music.py)Generate full musical compositions or instrumental tracks.
# Generate instrumental intro
python3 {baseDir}/scripts/music.py --prompt "Upbeat 6s news intro sting, instrumental" --length-ms 6000 -o intro.mp3
# Generate background bed
python3 {baseDir}/scripts/music.py --prompt "Soft ambient synth pad" --length-ms 30000 -o bed.mp3
# High quality MP3
python3 {baseDir}/scripts/music.py --prompt "Jazz piano" --length-ms 10000 -o jazz.mp3 --output-format mp3_44100_192
voices.py)List available voices and their IDs.
# List voices
python3 {baseDir}/scripts/voices.py
# JSON output
python3 {baseDir}/scripts/voices.py --json
voiceclone.py)Create instant voice clones from audio samples.
Security: by default this script will only read files from:
~/.openclaw/elevenlabs/voiceclone-samples/Copy your samples there (or pass --sample-dir). Reading files outside the sample directory is blocked.
# Clone from audio files (put samples into ~/.openclaw/elevenlabs/voiceclone-samples)
python3 {baseDir}/scripts/voiceclone.py --name "MyVoice" --files sample1.mp3 sample2.mp3
# Use a custom sample dir
python3 {baseDir}/scripts/voiceclone.py --name "Andi" --sample-dir ./samples --files a.m4a b.m4a --language de --gender male
# With description and noise removal
python3 {baseDir}/scripts/voiceclone.py --name "Andi" --files a.m4a b.m4a --description "German male" --denoise
quota.py)Check subscription quota and usage statistics.
# Show current quota
python3 {baseDir}/scripts/quota.py
# Include usage breakdown by voice
python3 {baseDir}/scripts/quota.py --usage
# Last 7 days usage
python3 {baseDir}/scripts/quota.py --usage --days 7
# JSON output
python3 {baseDir}/scripts/quota.py --json
Output:
š ElevenLabs Quota
=======================================
Plan: pro (active) ā annual
Characters: 66.6K / 500.0K (13.3%)
[āāāāāāāāāāāāāāāāāāāāāāāāāāāāāā]
Resets: 2026-02-18 (29 days)
Voices: 22 / 160 (IVC: ā)
Pro Voice: 0 / 1 (PVC: ā)
Generated Mar 1, 2026
Generate expressive voiceovers for podcast episodes using Eleven v3 model with audio tags like [excited] or [whispers] to add emotional depth. Create custom intro music and sound effects for branding, and clone host voices for consistent audio across episodes.
Produce multilingual narration for online courses using Multilingual v2 model for clear, straightforward explanations. Add sound effects to enhance engagement in interactive modules and generate background music for video lessons to maintain learner focus.
Implement low-latency Turbo v2.5 model for real-time voice responses in chatbots or IVR systems, supporting languages like German. Clone customer voices for personalized interactions and use quota checks to monitor API usage and manage costs effectively.
Create dynamic character dialogues with Eleven v3 model using audio tags such as [grumpy voice] or [sighs] for immersive storytelling. Generate sound effects and background music loops for game environments, and clone voice actors for consistent character audio across updates.
Produce high-quality voiceovers for commercials using Eleven v3 model with emotional tags like [warmly] to connect with audiences. Generate short music stings for brand intros and clone spokesperson voices for unified marketing campaigns across different media formats.
Integrate the skill into a subscription-based platform offering audio generation services to content creators. Charge users based on API usage tiers, with revenue from monthly subscriptions and pay-per-character models for high-volume clients.
Use the skill to offer custom voice cloning, sound design, and music generation services on freelance marketplaces. Generate revenue by charging per project or hourly rates for creating audio assets for clients in industries like podcasting or e-learning.
Deploy the skill within large organizations for internal training, customer support, or marketing automation. Monetize through enterprise licensing agreements, providing tailored support and integration services, with revenue from annual contracts and customization fees.
š¬ Integration Tip
Ensure the ELEVENLABS_API_KEY environment variable is set and use the appropriate model (e.g., Turbo v2.5 for real-time applications) to optimize performance and cost.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.