clonevClone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice).
Install via ClawdBot CLI:
clawdbot install instant-picture/clonevDO NOT try to use Docker containers directly.
DO NOT try to interact with coqui-xtts container - it is broken and restarting.
DO NOT try to use APIs or servers.
ONLY USE THE SCRIPT: scripts/clonev.sh
The script handles everything automatically. Just call it with text, voice sample, and language.
Clones any voice from a short audio sample and generates new speech in that voice.
Input:
Output: OGG voice file (cloned voice speaking the text)
Works with: Any voice! Yours, a celebrity, a character, etc.
$(scripts/clonev.sh "Your text here" /path/to/voice_sample.wav language)
That's it! Nothing else needed.
en)VOICE_FILE=$(scripts/clonev.sh "TEXT_HERE" "/path/to/sample.wav" LANGUAGE)
The variable $VOICE_FILE now contains the path to the generated OGG file.
# Generate cloned voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Hello, this is my cloned voice!" "/mnt/c/TEMP/Recording 25.wav" en)
# Send to Telegram (as voice message)
message action=send channel=telegram asVoice=true filePath="$VOICE"
# Generate Czech voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj, tohle je můj hlas" "/mnt/c/TEMP/Recording 25.wav" cs)
# Send
message action=send channel=telegram asVoice=true filePath="$VOICE"
#!/bin/bash
# Generate voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Task completed!" "/path/to/sample.wav" en)
# Verify file was created
if [ -f "$VOICE" ]; then
echo "Success! Voice file: $VOICE"
ls -lh "$VOICE"
else
echo "Error: Voice file not created"
fi
| Code | Language | Example Usage |
|------|----------|---------------|
| en | English | scripts/clonev.sh "Hello" sample.wav en |
| cs | Czech | scripts/clonev.sh "Ahoj" sample.wav cs |
| de | German | scripts/clonev.sh "Hallo" sample.wav de |
| fr | French | scripts/clonev.sh "Bonjour" sample.wav fr |
| es | Spanish | scripts/clonev.sh "Hola" sample.wav es |
Full list: en, cs, de, fr, es, it, pl, pt, tr, ru, nl, ar, zh, ja, hu, ko
Good samples:
Bad samples:
/mnt/c/TEMP/Docker-containers/coqui-tts/models-xtts/Make sure you're in the skill directory or use full path:
/home/bernie/clawd/skills/clonev/scripts/clonev.sh "text" sample.wav en
/)ls -la /path/to/sample.wavThe model should auto-download. If not:
cd /mnt/c/TEMP/Docker-containers/coqui-tts
docker run --rm --entrypoint "" \
-v $(pwd)/models-xtts:/root/.local/share/tts \
ghcr.io/coqui-ai/tts:latest \
python3 -c "from TTS.api import TTS; TTS('tts_models/multilingual/multi-dataset/xtts_v2')"
USER: "Clone my voice and say 'hello'"
â Get: sample path, text="hello", language="en"
â Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "hello" "/path/to/sample.wav" en)
â Result: $VOICE contains path to OGG file
â Send: message action=send channel=telegram asVoice=true filePath="$VOICE"
USER: "Make me speak Czech"
â Get: sample path, text="Ahoj", language="cs"
â Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj" "/path/to/sample.wav" cs)
â Send: message action=send channel=telegram asVoice=true filePath="$VOICE"
Generated files are saved to:
/mnt/c/TEMP/Docker-containers/coqui-tts/output/clonev_output.ogg
The script returns this path, so you can use it directly.
scripts/clonev.shcoqui-xtts containerSimple. Just use the script.
Clone any voice. Speak any language. Just use the script.
Generated Mar 1, 2026
Businesses can clone a representative's voice to send personalized audio messages to customers, enhancing engagement and trust. This can be used for appointment reminders, promotional announcements, or support follow-ups in multiple languages.
Educational platforms can clone a teacher's voice to create multilingual audio content for language learners or visually impaired users. This allows for consistent, familiar voice output across different languages and materials.
Content creators can clone voices of characters or celebrities to generate voiceovers for videos, podcasts, or games. This enables rapid production of audio content without needing the original speaker present.
Developers can integrate this skill into AI assistants to allow users to clone their own voice for a more personalized interaction. This can be used in smart home devices, apps, or chatbots for a unique user experience.
Companies can clone a trainer's voice to produce multilingual training modules or onboarding materials. This ensures consistent messaging and reduces the need for live sessions across global teams.
Offer a free tier with limited voice cloning requests per month and premium plans for higher usage, advanced features, or commercial licensing. Revenue comes from subscription fees and enterprise contracts.
Provide an API that allows developers to integrate voice cloning into their applications, charging per API call or based on usage volume. This targets tech companies needing scalable voice generation solutions.
License the voice cloning technology to large organizations for internal use, such as customer service or training, with customization and support. Revenue is generated through one-time licensing fees and ongoing maintenance contracts.
ð¬ Integration Tip
Ensure voice samples are clear WAV files of 6-30 seconds and use the provided script directly to avoid Docker issues; test with short texts first to verify output quality.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.