eachlabs-voice-audioText-to-speech, speech-to-text, voice conversion, and audio processing using EachLabs AI models. Supports ElevenLabs TTS, Whisper transcription with diarization, and RVC voice conversion. Use when the user needs TTS, transcription, or voice conversion.
Install via ClawdBot CLI:
clawdbot install eftalyurtseven/eachlabs-voice-audioText-to-speech, speech-to-text transcription, voice conversion, and audio utilities via the EachLabs Predictions API.
Header: X-API-Key: <your-api-key>
Set the EACHLABS_API_KEY environment variable. Get your key at eachlabs.ai.
| Model | Slug | Best For |
|-------|------|----------|
| ElevenLabs TTS | elevenlabs-text-to-speech | High quality TTS |
| ElevenLabs TTS w/ Timestamps | elevenlabs-text-to-speech-with-timestamp | TTS with word timing |
| ElevenLabs Text to Dialogue | elevenlabs-text-to-dialogue | Multi-speaker dialogue |
| ElevenLabs Sound Effects | elevenlabs-sound-effects | Sound effect generation |
| ElevenLabs Voice Design v2 | elevenlabs-voice-design-v2 | Custom voice design |
| Kling V1 TTS | kling-v1-tts | Kling text-to-speech |
| Kokoro 82M | kokoro-82m | Lightweight TTS |
| Play AI Dialog | play-ai-text-to-speech-dialog | Dialog TTS |
| Stable Audio 2.5 | stable-audio-2-5-text-to-audio | Text to audio |
| Model | Slug | Best For |
|-------|------|----------|
| ElevenLabs Scribe v2 | elevenlabs-speech-to-text-scribe-v2 | Best quality transcription |
| ElevenLabs STT | elevenlabs-speech-to-text | Standard transcription |
| Wizper with Timestamp | wizper-with-timestamp | Timestamped transcription |
| Wizper | wizper | Basic transcription |
| Whisper | whisper | Open-source transcription |
| Whisper Diarization | whisper-diarization | Speaker identification |
| Incredibly Fast Whisper | incredibly-fast-whisper | Fastest transcription |
| Model | Slug | Best For |
|-------|------|----------|
| RVC v2 | rvc-v2 | Voice conversion |
| Train RVC | train-rvc | Train custom voice model |
| ElevenLabs Voice Clone | elevenlabs-voice-clone | Voice cloning |
| ElevenLabs Voice Changer | elevenlabs-voice-changer | Voice transformation |
| ElevenLabs Voice Design v3 | elevenlabs-voice-design-v3 | Advanced voice design |
| ElevenLabs Dubbing | elevenlabs-dubbing | Video dubbing |
| Chatterbox S2S | chatterbox-speech-to-speech | Speech to speech |
| Open Voice | openvoice | Open-source voice clone |
| XTTS v2 | xtts-v2 | Multi-language voice clone |
| Stable Audio 2.5 Inpaint | stable-audio-2-5-inpaint | Audio inpainting |
| Stable Audio 2.5 A2A | stable-audio-2-5-audio-to-audio | Audio transformation |
| Audio Trimmer | audio-trimmer-with-fade | Audio trimming with fade |
| Model | Slug | Best For |
|-------|------|----------|
| FFmpeg Merge Audio Video | ffmpeg-api-merge-audio-video | Merge audio with video |
| Toolkit Video Convert | toolkit | Video/audio conversion |
GET https://api.eachlabs.ai/v1/model?slug= β validates the model exists and returns the request_schema with exact input parameters. Always do this before creating a prediction to ensure correct inputs.https://api.eachlabs.ai/v1/prediction with model slug, version "0.0.1", and input matching the schemaGET https://api.eachlabs.ai/v1/prediction/{id} until status is "success" or "failed"curl -X POST https://api.eachlabs.ai/v1/prediction \
-H "Content-Type: application/json" \
-H "X-API-Key: $EACHLABS_API_KEY" \
-d '{
"model": "elevenlabs-text-to-speech",
"version": "0.0.1",
"input": {
"text": "Welcome to our product demo. Today we will walk through the key features.",
"voice_id": "EXAVITQu4vr4xnSDxMaL",
"model_id": "eleven_v3",
"stability": 0.5,
"similarity_boost": 0.7
}
}'
curl -X POST https://api.eachlabs.ai/v1/prediction \
-H "Content-Type: application/json" \
-H "X-API-Key: $EACHLABS_API_KEY" \
-d '{
"model": "elevenlabs-speech-to-text-scribe-v2",
"version": "0.0.1",
"input": {
"media_url": "https://example.com/recording.mp3",
"diarize": true,
"timestamps_granularity": "word"
}
}'
curl -X POST https://api.eachlabs.ai/v1/prediction \
-H "Content-Type: application/json" \
-H "X-API-Key: $EACHLABS_API_KEY" \
-d '{
"model": "wizper-with-timestamp",
"version": "0.0.1",
"input": {
"audio_url": "https://example.com/audio.mp3",
"language": "en",
"task": "transcribe",
"chunk_level": "segment"
}
}'
curl -X POST https://api.eachlabs.ai/v1/prediction \
-H "Content-Type: application/json" \
-H "X-API-Key: $EACHLABS_API_KEY" \
-d '{
"model": "whisper-diarization",
"version": "0.0.1",
"input": {
"file_url": "https://example.com/meeting.mp3",
"num_speakers": 3,
"language": "en",
"group_segments": true
}
}'
curl -X POST https://api.eachlabs.ai/v1/prediction \
-H "Content-Type: application/json" \
-H "X-API-Key: $EACHLABS_API_KEY" \
-d '{
"model": "rvc-v2",
"version": "0.0.1",
"input": {
"input_audio": "https://example.com/vocals.wav",
"rvc_model": "CUSTOM",
"custom_rvc_model_download_url": "https://example.com/my-voice-model.zip",
"pitch_change": 0,
"output_format": "wav"
}
}'
curl -X POST https://api.eachlabs.ai/v1/prediction \
-H "Content-Type: application/json" \
-H "X-API-Key: $EACHLABS_API_KEY" \
-d '{
"model": "ffmpeg-api-merge-audio-video",
"version": "0.0.1",
"input": {
"video_url": "https://example.com/video.mp4",
"audio_url": "https://example.com/narration.mp3",
"start_offset": 0
}
}'
The elevenlabs-text-to-speech model supports these voice IDs. Pass the raw ID string:
| Voice ID | Notes |
|----------|-------|
| EXAVITQu4vr4xnSDxMaL | Default voice |
| 9BWtsMINqrJLrRacOk9x | β |
| CwhRBWXzGAHq8TQ4Fs17 | β |
| FGY2WhTYpPnrIDTdsKH5 | β |
| JBFqnCBsd6RMkjVDRZzb | β |
| N2lVS1w4EtoT3dr4eOWO | β |
| TX3LPaxmHKxFdv7VOQHJ | β |
| XB0fDUnXU5powFXDhCwa | β |
| onwK4e9ZLuTAKqWW03F9 | β |
| pFZP5JQG7iQjIQuC4Bku | β |
See references/MODELS.md for complete parameter details for each model.
Generated Mar 1, 2026
Use TTS models like ElevenLabs to generate high-quality voiceovers for podcast episodes from written scripts, and apply audio trimming utilities for editing. This streamlines content creation for solo creators or small teams.
Leverage Whisper Diarization to transcribe business meetings with speaker identification, enabling automated note-taking and insights extraction for improved collaboration and record-keeping.
Employ ElevenLabs TTS and dubbing models to create multilingual voiceovers for educational videos, making content accessible to diverse audiences and enhancing engagement in online courses.
Utilize voice cloning and conversion models like RVC v2 to generate personalized voice responses in chatbots or IVR systems, providing a consistent and branded customer experience.
Apply audio utilities such as FFmpeg merge to sync high-quality TTS audio with video content, and use sound effects generation for post-production in marketing or film projects.
Offer a cloud-based platform where users pay monthly for access to TTS, transcription, and voice conversion APIs, targeting developers and businesses needing scalable audio solutions.
Provide branded audio processing services to other companies, such as call centers or e-learning platforms, integrating the skill's models into their existing products for a licensing fee.
Offer basic TTS and transcription features for free to attract individual users, with premium upgrades for advanced models like voice cloning or high-volume usage, driving conversions through value-added services.
π¬ Integration Tip
Always validate model schemas via the API before predictions to ensure correct inputs and reduce errors in production workflows.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.