elevenlabs-voiceText-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
Install via ClawdBot CLI:
clawdbot install amreahmed/elevenlabs-voiceComplete voice solution — both TTS and STT using one API:
Set your API key:
export ELEVENLABS_API_KEY="sk_..."
Or create .env file in workspace root.
Convert text to natural-sounding speech:
python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3
With custom voice:
python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3
python scripts/elevenlabs_speech.py voices
from scripts.elevenlabs_speech import ElevenLabsClient
client = ElevenLabsClient(api_key="sk_...")
# Basic TTS
result = client.text_to_speech(
text="Hello from zerox",
output_path="greeting.mp3"
)
# With custom settings
result = client.text_to_speech(
text="Your text here",
voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel
stability=0.5,
similarity_boost=0.75,
output_path="output.mp3"
)
# Get available voices
voices = client.get_voices()
for voice in voices['voices']:
print(f"{voice['name']}: {voice['voice_id']}")
| Voice ID | Name | Description |
|----------|------|-------------|
| 21m00Tcm4TlvDq8ikWAM | Rachel | Natural, versatile (default) |
| AZnzlk1XvdvUeBnXmlld | Domi | Strong, energetic |
| EXAVITQu4vr4xnSDxMaL | Bella | Soft, soothing |
| ErXwobaYiN019PkySvjV | Antoni | Well-rounded |
| MF3mGyEYCl7XYWbV9V6O | Elli | Warm, friendly |
| TxGEqnHWrfWFTfGW9XjX | Josh | Deep, calm |
| VR6AewLTigWG4xSOukaG | Arnold | Authoritative |
Default: stability=0.5, similarity_boost=0.75
eleven_turbo_v2_5 - Fast, high quality (default)eleven_multilingual_v2 - Best for non-Englisheleven_monolingual_v1 - English onlyWhen user sends text and wants voice reply:
# Generate speech
result = client.text_to_speech(text=user_text, output_path="reply.mp3")
# Send via Telegram message tool with media path
message(action="send", media="path/to/reply.mp3", as_voice=True)
Check https://elevenlabs.io/pricing for current rates. Free tier available!
Transcribe voice messages using ElevenLabs Scribe:
python scripts/elevenlabs_scribe.py voice_message.ogg
With specific language:
python scripts/elevenlabs_scribe.py voice_message.ogg --language ara
With speaker diarization (multiple speakers):
python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2
from scripts.elevenlabs_scribe import ElevenLabsScribe
client = ElevenLabsScribe(api_key="sk-...")
# Basic transcription
result = client.transcribe("voice_message.ogg")
print(result['text'])
# With language hint (improves accuracy)
result = client.transcribe("voice_message.ogg", language_code="ara")
# With speaker detection
result = client.transcribe("voice_message.ogg", num_speakers=2)
.ogg)Scribe supports 99 languages including:
ara)eng)spa)fra)Without language hint, it auto-detects.
User sends voice message → You reply with voice:
from scripts.elevenlabs_scribe import ElevenLabsScribe
from scripts.elevenlabs_speech import ElevenLabsClient
# 1. Transcribe user's voice message
stt = ElevenLabsScribe()
transcription = stt.transcribe("user_voice.ogg")
user_text = transcription['text']
# 2. Process/understand the text
# ... your logic here ...
# 3. Generate response text
response_text = "Your response here"
# 4. Convert to speech
tts = ElevenLabsClient()
tts.text_to_speech(response_text, output_path="reply.mp3")
# 5. Send voice reply
message(action="send", media="reply.mp3", as_voice=True)
Check https://elevenlabs.io/pricing for current rates:
TTS (Text-to-Speech):
STT (Speech-to-Text) - Scribe:
Generated Mar 1, 2026
Integrate with Telegram bots to transcribe user voice messages and reply with AI-generated voice responses. Enables hands-free communication for users who prefer voice interactions over text, enhancing accessibility and engagement.
Use STT to transcribe customer voice inquiries in various languages and TTS to generate voice replies in the customer's preferred language. Improves support efficiency and personalization for global businesses.
Convert written educational materials into high-quality voiceovers for podcasts, e-learning modules, or audiobooks. Supports multiple languages and voices to cater to diverse audiences and learning styles.
Develop applications that read out text content (e.g., articles, emails) using TTS and transcribe voice commands via STT. Enhances digital accessibility by enabling voice-based navigation and content consumption.
Integrate with smart home devices or IoT systems to process voice commands via STT and provide audible feedback using TTS. Enables natural voice interactions for controlling devices or accessing information.
Offer a free tier with limited characters or transcriptions per month, then charge for higher usage tiers or premium features like advanced voices or faster processing. Attracts users with no upfront cost and monetizes heavy usage.
Provide the skill as an API service for developers to integrate into their applications, charging per API call or based on data processed (e.g., per character for TTS, per minute for STT). Scales with customer usage.
License the technology to businesses for embedding into their own products under their brand, with custom pricing based on features and support. Targets enterprises needing tailored voice solutions.
💬 Integration Tip
Start by setting up the API key in environment variables and test with basic TTS and STT scripts before integrating into larger projects to ensure compatibility.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.