elevenlabs-voicesHigh-quality voice synthesis with 18 personas, 32 languages, sound effects, batch processing, and voice design using ElevenLabs API.
Install via ClawdBot CLI:
clawdbot install robbyczgw-cla/elevenlabs-voicesComprehensive voice synthesis toolkit using ElevenLabs API.
When you first use this skill (no config.json exists), run the interactive setup wizard:
python3 scripts/setup.py
The wizard will guide you through:
π Privacy: Your API key is stored locally in config.json only. It never leaves your machine and is automatically excluded from git via .gitignore.
To reconfigure at any time, simply run the setup wizard again.
| Voice | Accent | Gender | Persona | Best For |
|-------|--------|--------|---------|----------|
| rachel | πΊπΈ US | female | warm | Conversations, tutorials |
| adam | πΊπΈ US | male | narrator | Documentaries, audiobooks |
| bella | πΊπΈ US | female | professional | Business, presentations |
| brian | πΊπΈ US | male | comforting | Meditation, calm content |
| george | π¬π§ UK | male | storyteller | Audiobooks, storytelling |
| alice | π¬π§ UK | female | educator | Tutorials, explanations |
| callum | πΊπΈ US | male | trickster | Playful, gaming |
| charlie | π¦πΊ AU | male | energetic | Sports, motivation |
| jessica | πΊπΈ US | female | playful | Social media, casual |
| lily | π¬π§ UK | female | actress | Drama, elegant content |
| matilda | πΊπΈ US | female | professional | Corporate, news |
| river | πΊπΈ US | neutral | neutral | Inclusive, informative |
| roger | πΊπΈ US | male | casual | Podcasts, relaxed |
| daniel | π¬π§ UK | male | broadcaster | News, announcements |
| eric | πΊπΈ US | male | trustworthy | Business, corporate |
| chris | πΊπΈ US | male | friendly | Tutorials, approachable |
| will | πΊπΈ US | male | optimist | Motivation, uplifting |
| liam | πΊπΈ US | male | social | YouTube, social media |
default β rachel (warm, friendly)narrator β adam (documentaries)professional β matilda (corporate)storyteller β george (audiobooks)educator β alice (tutorials)calm β brian (meditation)energetic β liam (social media)trustworthy β eric (business)neutral β river (inclusive)british β georgeaustralian β charliebroadcaster β daniel (news)The multilingual v2 model supports these languages:
| Code | Language | Code | Language |
|------|----------|------|----------|
| en | English | pl | Polish |
| de | German | nl | Dutch |
| es | Spanish | sv | Swedish |
| fr | French | da | Danish |
| it | Italian | fi | Finnish |
| pt | Portuguese | no | Norwegian |
| ru | Russian | tr | Turkish |
| uk | Ukrainian | cs | Czech |
| ja | Japanese | sk | Slovak |
| ko | Korean | hu | Hungarian |
| zh | Chinese | ro | Romanian |
| ar | Arabic | bg | Bulgarian |
| hi | Hindi | hr | Croatian |
| ta | Tamil | el | Greek |
| id | Indonesian | ms | Malay |
| vi | Vietnamese | th | Thai |
# Synthesize in German
python3 tts.py --text "Guten Tag!" --voice rachel --lang de
# Synthesize in French
python3 tts.py --text "Bonjour le monde!" --voice adam --lang fr
# List all languages
python3 tts.py --languages
# List all voices
python3 scripts/tts.py --list
# Generate speech
python3 scripts/tts.py --text "Hello world" --voice rachel --output hello.mp3
# Use a preset
python3 scripts/tts.py --text "Breaking news..." --voice broadcaster --output news.mp3
# Multi-language
python3 scripts/tts.py --text "Bonjour!" --voice rachel --lang fr --output french.mp3
Generate audio with real-time streaming (good for long texts):
# Stream audio as it generates
python3 scripts/tts.py --text "This is a long story..." --voice adam --stream
# Streaming with custom output
python3 scripts/tts.py --text "Chapter one..." --voice george --stream --output chapter1.mp3
Process multiple texts from a file:
# From newline-separated text file
python3 scripts/tts.py --batch texts.txt --voice rachel --output-dir ./audio
# From JSON file
python3 scripts/tts.py --batch batch.json --output-dir ./output
JSON batch format:
[
{"text": "First line", "voice": "rachel", "output": "line1.mp3"},
{"text": "Second line", "voice": "adam", "output": "line2.mp3"},
{"text": "Third line"}
]
Simple text format (one per line):
Hello, this is the first sentence.
This is the second sentence.
And this is the third.
# Show usage stats and cost estimates
python3 scripts/tts.py --stats
# Reset statistics
python3 scripts/tts.py --reset-stats
Generate AI-powered sound effects from text descriptions:
# Generate a sound effect
python3 scripts/sfx.py --prompt "Thunder rumbling in the distance"
# With specific duration (0.5-22 seconds)
python3 scripts/sfx.py --prompt "Cat meowing" --duration 3 --output cat.mp3
# Adjust prompt influence (0.0-1.0)
python3 scripts/sfx.py --prompt "Footsteps on gravel" --influence 0.5
# Batch SFX generation
python3 scripts/sfx.py --batch sounds.json --output-dir ./sfx
# Show prompt examples
python3 scripts/sfx.py --examples
Example prompts:
Create custom voices from text descriptions:
# Basic voice design
python3 scripts/voice-design.py --gender female --age middle_aged --accent american \
--description "A warm, motherly voice"
# With custom preview text
python3 scripts/voice-design.py --gender male --age young --accent british \
--text "Welcome to the adventure!" --output preview.mp3
# Save to your ElevenLabs library
python3 scripts/voice-design.py --gender female --age young --accent american \
--description "Energetic podcast host" --save "MyHost"
# List all design options
python3 scripts/voice-design.py --options
Voice Design Options:
| Option | Values |
|--------|--------|
| Gender | male, female, neutral |
| Age | young, middle_aged, old |
| Accent | american, british, african, australian, indian, latin, middle_eastern, scandinavian, eastern_european |
| Accent Strength | 0.3-2.0 (subtle to strong) |
Customize how words are pronounced:
Edit pronunciations.json:
{
"rules": [
{
"word": "OpenClaw",
"replacement": "Open Claw",
"comment": "Pronounce as two words"
},
{
"word": "API",
"replacement": "A P I",
"comment": "Spell out acronym"
}
]
}
Usage:
# Pronunciations are applied automatically
python3 scripts/tts.py --text "The OpenClaw API is great" --voice rachel
# Disable pronunciations
python3 scripts/tts.py --text "The API is great" --voice rachel --no-pronunciations
The skill tracks your character usage and estimates costs:
python3 scripts/tts.py --stats
Output:
π ElevenLabs Usage Statistics
Total Characters: 15,230
Total Requests: 42
Since: 2024-01-15
π° Estimated Costs:
Starter $4.57 ($0.30/1k chars)
Creator $3.66 ($0.24/1k chars)
Pro $2.74 ($0.18/1k chars)
Scale $1.68 ($0.11/1k chars)
OpenClaw has built-in TTS support that can use ElevenLabs. Configure in ~/.openclaw/openclaw.json:
{
"tts": {
"enabled": true,
"provider": "elevenlabs",
"elevenlabs": {
"apiKey": "your-api-key-here",
"voice": "rachel",
"model": "eleven_multilingual_v2"
}
}
}
In OpenClaw conversations:
/tts on to enable automatic TTStts tool directly for one-off speech# OpenClaw can run these scripts directly
exec python3 /path/to/skills/elevenlabs-voices/scripts/tts.py --text "Hello" --voice rachel
The scripts look for API key in this order:
ELEVEN_API_KEY or ELEVENLABS_API_KEY environment variable~/.openclaw/openclaw.json β tts.elevenlabs.apiKey).env fileCreate .env file:
echo 'ELEVEN_API_KEY=your-key-here' > .env
Each voice has tuned settings for optimal output:
| Setting | Range | Description |
|---------|-------|-------------|
| stability | 0.0-1.0 | Higher = consistent, lower = expressive |
| similarity_boost | 0.0-1.0 | How closely to match original voice |
| style | 0.0-1.0 | Exaggeration of speaking style |
elevenlabs-voices/
βββ SKILL.md # This documentation
βββ README.md # Quick start guide
βββ config.json # Your local config (created by setup, in .gitignore)
βββ voices.json # Voice definitions & settings
βββ pronunciations.json # Custom pronunciation rules
βββ examples.md # Detailed usage examples
βββ scripts/
β βββ setup.py # Interactive setup wizard
β βββ tts.py # Main TTS script
β βββ sfx.py # Sound effects generator
β βββ voice-design.py # Voice design tool
βββ references/
βββ voice-guide.md # Voice selection guide
scripts/setup.py)config.json (added to .gitignore)--lang parameter--stream flagsfx.py)--batch flag--stats flagvoice-design.py)Generated Mar 1, 2026
Educational platforms can use this skill to generate voiceovers for online courses and tutorials in multiple languages. The educator and narrator presets are ideal for clear instructional audio, while batch processing handles large volumes of content efficiently.
Publishers and authors can synthesize audiobooks with expressive voices like George for storytelling. Streaming mode supports long texts, and custom pronunciation dictionaries ensure accurate narration of specialized terms.
Brands can create engaging audio for ads, podcasts, and social media posts using energetic or playful voices like Liam or Jessica. Sound effects add flair to promotional content, enhancing audience engagement.
Businesses can generate professional voiceovers for internal training videos and corporate announcements using voices like Matilda or Eric. Multilingual support allows for global team communications in languages like Spanish or German.
Developers can integrate this skill into apps to provide text-to-speech for visually impaired users, using neutral voices like River for inclusive content. Cost tracking helps manage usage in high-volume applications.
Offer a SaaS platform where users pay monthly for access to voice synthesis, leveraging ElevenLabs API. Include tiered plans based on character limits and premium features like voice design or batch processing.
Provide voiceover services for clients in industries like e-learning or marketing, using this skill to produce high-quality audio efficiently. Charge per project or hourly, with upselling for multilingual or custom voice options.
License the skill to other software companies for embedding into their products, such as video editors or e-learning platforms. Revenue comes from licensing fees or revenue-sharing agreements based on usage.
π¬ Integration Tip
Set up the API key via the setup wizard first, then use simple CLI commands for quick testing; for advanced use, explore batch processing and voice design features.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.