clawvoxClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more.
Install via ClawdBot CLI:
clawdbot install abhishek-official1/clawvoxTransform your OpenClaw assistant into a professional voice production studio with ClawVox - powered by ElevenLabs.
| Action | Command | Description |
|--------|---------|-------------|
| Speak | {baseDir}/scripts/speak.sh 'text' | Convert text to speech |
| Transcribe | {baseDir}/scripts/transcribe.sh audio.mp3 | Speech to text |
| Clone | {baseDir}/scripts/clone.sh --name "Voice" sample.mp3 | Clone a voice |
| SFX | {baseDir}/scripts/sfx.sh "thunder storm" | Generate sound effects |
| Voices | {baseDir}/scripts/voices.sh list | List available voices |
| Dub | {baseDir}/scripts/dub.sh --target es audio.mp3 | Translate audio |
| Isolate | {baseDir}/scripts/isolate.sh audio.mp3 | Remove background noise |
~/.openclaw/openclaw.json:{
skills: {
entries: {
"clawvox": {
apiKey: "YOUR_ELEVENLABS_API_KEY",
config: {
defaultVoice: "Rachel",
defaultModel: "eleven_turbo_v2_5",
outputDir: "~/.openclaw/audio"
}
}
}
}
}
Or set the environment variable:
export ELEVENLABS_API_KEY="your_api_key_here"
# Quick speak with default voice (Rachel)
{baseDir}/scripts/speak.sh 'Hello, I am your personal AI assistant.'
# Specify voice by name
{baseDir}/scripts/speak.sh --voice Adam 'Hello from Adam'
# Save to file
{baseDir}/scripts/speak.sh --out ~/audio/greeting.mp3 'Welcome to the show'
# Use specific model
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 'Bonjour'
# Adjust voice settings
{baseDir}/scripts/speak.sh --stability 0.5 --similarity 0.8 'Expressive speech'
# Adjust speed
{baseDir}/scripts/speak.sh --speed 1.2 'Faster speech'
# Use multilingual model for other languages
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Rachel 'Hola, que tal'
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Adam 'Guten Tag'
| Model | Latency | Languages | Best For |
|-------|---------|-----------|----------|
| eleven_flash_v2_5 | ~75ms | 32 | Real-time, streaming |
| eleven_turbo_v2_5 | ~250ms | 32 | Balanced quality/speed |
| eleven_multilingual_v2 | ~500ms | 29 | Long-form, highest quality |
Premade voices: Rachel, Adam, Antoni, Bella, Domi, Elli, Josh, Sam, Callum, Charlie, George, Liam, Matilda, Alice, Bill, Brian, Chris, Daniel, Eric, Jessica, Laura, Lily, River, Roger, Sarah, Will
# Generate audio from text file
{baseDir}/scripts/speak.sh --input chapter.txt --voice "George" --out audiobook.mp3
# Transcribe audio file
{baseDir}/scripts/transcribe.sh recording.mp3
# Save to file
{baseDir}/scripts/transcribe.sh --out transcript.txt audio.mp3
# Transcribe with language hint
{baseDir}/scripts/transcribe.sh --language es spanish_audio.mp3
# Include timestamps
{baseDir}/scripts/transcribe.sh --timestamps podcast.mp3
# Clone from single sample (minimum 30 seconds recommended)
{baseDir}/scripts/clone.sh --name MyVoice recording.mp3
# Clone with description
{baseDir}/scripts/clone.sh --name BusinessVoice \
--description 'Professional male voice' \
sample.mp3
# Clone with labels
{baseDir}/scripts/clone.sh --name MyVoice \
--labels '{"gender":"male","age":"adult"}' \
sample.mp3
# Remove background noise during cloning
{baseDir}/scripts/clone.sh --name CleanVoice \
--remove-bg-noise \
sample.mp3
# Test cloned voice
{baseDir}/scripts/speak.sh --voice MyVoice 'Testing my cloned voice'
# List all available voices
{baseDir}/scripts/voices.sh list
# Get voice details
{baseDir}/scripts/voices.sh info --name Rachel
{baseDir}/scripts/voices.sh info --id 21m00Tcm4TlvDq8ikWAM
# Search voices (filter output with grep)
{baseDir}/scripts/voices.sh list | grep -i "female"
# Filter by category
{baseDir}/scripts/voices.sh list --category premade
{baseDir}/scripts/voices.sh list --category cloned
# Download voice preview
{baseDir}/scripts/voices.sh preview --name Rachel -o preview.mp3
# Delete custom voice
{baseDir}/scripts/voices.sh delete --id "voice_id"
# Generate sound effect
{baseDir}/scripts/sfx.sh 'Heavy rain on a tin roof'
# With duration
{baseDir}/scripts/sfx.sh --duration 5 'Forest ambiance with birds'
# With prompt influence (higher = more accurate)
{baseDir}/scripts/sfx.sh --influence 0.8 'Sci-fi laser gun firing'
# Save to file
{baseDir}/scripts/sfx.sh --out effects/thunder.mp3 'Rolling thunder'
Note: Duration range is 0.5 to 22 seconds (rounded to nearest 0.5)
# Remove background noise and isolate voice
{baseDir}/scripts/isolate.sh noisy_recording.mp3
# Save to specific file
{baseDir}/scripts/isolate.sh --out clean_voice.mp3 meeting_recording.mp3
# Don't tag audio events
{baseDir}/scripts/isolate.sh --no-audio-events recording.mp3
Requirements:
# Dub audio to Spanish
{baseDir}/scripts/dub.sh --target es audio.mp3
# Dub with source language specified
{baseDir}/scripts/dub.sh --source en --target ja video.mp4
# Check dubbing status
{baseDir}/scripts/dub.sh --status --id "dubbing_id"
# Download dubbed audio
{baseDir}/scripts/dub.sh --download --id "dubbing_id" --out dubbed.mp3
Supported languages: en, es, fr, de, it, pt, pl, hi, ar, zh, ja, ko, nl, ru, tr, vi, sv, da, fi, cs, el, he, id, ms, no, ro, uk, hu, th
For direct API access, all scripts use curl under the hood:
# Direct TTS API call
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/VOICE_ID" \
-H "xi-api-key: $ELEVENLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "model_id": "eleven_turbo_v2_5"}' \
--output speech.mp3
All scripts provide helpful error messages:
Run the test suite to verify everything works:
{baseDir}/test.sh YOUR_API_KEY
Or with environment variable:
export ELEVENLABS_API_KEY="your_key"
{baseDir}/test.sh
tools.exec.host: "sandbox"'Hello world' not "Hello world!"!) in text when using double quotes--input option with a fileELEVENLABS_API_KEY is set or configured in openclaw.jsonapt-get install jq (Linux) or brew install jq (macOS){baseDir}/scripts/voices.sh list to see available voices# Enable verbose output
DEBUG=1 {baseDir}/scripts/speak.sh 'test'
# Show API request details
DEBUG=1 {baseDir}/scripts/transcribe.sh audio.mp3
ElevenLabs API pricing (approximate):
Free tier: ~10,000 characters/month
Generated Mar 1, 2026
Automate podcast creation by generating host voices, transcribing interviews, and adding sound effects. Use voice cloning for consistent host voices across episodes and dub content for international audiences.
Create multilingual educational audio from text scripts, clone instructor voices for consistency, and transcribe lectures for accessibility. Dub existing courses into target languages to expand global reach.
Build AI-powered voice assistants with natural-sounding speech, clone brand-specific voices for a personalized touch, and transcribe customer calls for analysis. Use voice isolation to clean up noisy recordings.
Generate high-quality audiobooks from text files using long-form models, clone narrators' voices for series consistency, and add sound effects for immersive storytelling. Transcribe audio for subtitles or scripts.
Produce voiceovers for ads in multiple languages, clone celebrity or brand ambassador voices for campaigns, and create custom sound effects for branding. Use dubbing to adapt existing audio content for new markets.
Offer monthly plans for businesses to generate speech, transcribe audio, and clone voices. Tier pricing based on usage limits, such as hours of audio generated or number of voice clones, with premium support.
Resell ElevenLabs API access with markup, charging per request for TTS, transcription, or voice cloning. Target developers and small teams with flexible pricing and bundled services like voice isolation.
License the skill as a branded solution for agencies or enterprises, integrating it into their workflows for voiceovers, dubbing, and audio editing. Charge upfront licensing fees plus ongoing maintenance.
💬 Integration Tip
Set up the ELEVENLABS_API_KEY environment variable first for quick testing, then configure detailed settings like default voice and output directory in openclaw.json for production use.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.