voice-replyLocal text-to-speech using Piper voices via sherpa-onnx. 100% offline, no API keys required. Use when user asks for a voice reply, audio response, spoken answer, or wants to hear something read aloud. Supports multiple languages including German (thorsten) and English (ryan) voices. Outputs Telegram-compatible voice notes with [[audio_as_voice]] tag.
Install via ClawdBot CLI:
clawdbot install stolot0mt0m/voice-replyGenerate voice audio replies using local Piper TTS via sherpa-onnx. Completely offline, no cloud APIs needed.
cd scripts
sudo ./install.sh
sudo mkdir -p /opt/sherpa-onnx
cd /opt/sherpa-onnx
curl -L -o sherpa.tar.bz2 "https://github.com/k2-fsa/sherpa-onnx/releases/download/v1.12.23/sherpa-onnx-v1.12.23-linux-x64-shared.tar.bz2"
sudo tar -xjf sherpa.tar.bz2 --strip-components=1
rm sherpa.tar.bz2
sudo mkdir -p /opt/piper-voices
cd /opt/piper-voices
# German - thorsten (medium quality, natural male voice)
curl -L -o thorsten.tar.bz2 "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-thorsten-medium.tar.bz2"
sudo tar -xjf thorsten.tar.bz2 && rm thorsten.tar.bz2
# English - ryan (high quality, clear US male voice)
curl -L -o ryan.tar.bz2 "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_US-ryan-high.tar.bz2"
sudo tar -xjf ryan.tar.bz2 && rm ryan.tar.bz2
sudo apt install -y ffmpeg
Add to your OpenClaw service or shell:
export SHERPA_ONNX_DIR="/opt/sherpa-onnx"
export PIPER_VOICES_DIR="/opt/piper-voices"
{baseDir}/bin/voice-reply "Text to speak" [language]
| Parameter | Description | Default |
|-----------|-------------|---------|
| text | The text to convert to speech | (required) |
| language | de for German, en for English | auto-detect |
# German (explicit)
{baseDir}/bin/voice-reply "Hallo, ich bin dein Assistent!" de
# English (explicit)
{baseDir}/bin/voice-reply "Hello, I am your assistant!" en
# Auto-detect (detects German from umlauts and common words)
{baseDir}/bin/voice-reply "Guten Tag, wie geht es dir?"
# Auto-detect (defaults to English)
{baseDir}/bin/voice-reply "The weather is nice today."
The script outputs two lines that OpenClaw processes for Telegram:
[[audio_as_voice]]
MEDIA:/tmp/voice-reply-output.ogg
[[audio_as_voice]] - Tag that tells Telegram to display as voice bubbleMEDIA:path - Path to the generated OGG Opus audio file| Language | Voice | Quality | Description |
|----------|-------|---------|-------------|
| German (de) | thorsten | medium | Natural male voice, clear pronunciation |
| English (en) | ryan | high | Clear US male voice, professional tone |
Browse available Piper voices at:
Download and extract to $PIPER_VOICES_DIR, then modify the script to include the new voice.
Ensure SHERPA_ONNX_DIR is set and contains bin/sherpa-onnx-offline-tts.
Check that voice model files exist: *.onnx, tokens.txt, espeak-ng-data/
Ensure the output includes [[audio_as_voice]] tag on its own line before the MEDIA: line.
Generated Mar 1, 2026
Deploy in regions with poor internet connectivity to provide voice-based customer support without cloud dependencies. Enables automated voice responses for FAQs or status updates, ensuring service continuity.
Integrate into language learning apps or e-learning platforms to generate spoken content locally, protecting user data. Useful for pronunciation guides or audio lessons in German and English.
Create a Telegram bot that sends voice alerts or reminders as voice bubbles, leveraging the offline TTS for real-time updates. Ideal for personal productivity or community announcements.
Use in accessibility tools to convert text to speech locally, offering immediate audio feedback without internet reliance. Supports multiple languages for broader user reach.
Embed in self-service kiosks or POS systems to provide voice instructions or promotions in German or English. Reduces cloud costs and latency in high-traffic environments.
Offer paid consulting or customization services for integrating the skill into specific applications, such as adding new voice models or optimizing for different OS environments. Revenue comes from one-time fees or retainer contracts.
Package the skill as part of a subscription-based software suite for enterprises needing offline TTS capabilities, with added features like analytics or multi-platform support. Revenue is generated through monthly or annual subscriptions.
License the skill to hardware manufacturers for integration into IoT devices like smart speakers or industrial monitors, enabling voice output without internet. Revenue comes from licensing fees per device sold.
💬 Integration Tip
Ensure environment variables are correctly set in deployment scripts and test voice model paths to avoid common installation errors.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.