jarvis-voiceTurn your AI into JARVIS. Voice, wit, and personality โ the complete package. Humor cranked to maximum.
Install via ClawdBot CLI:
clawdbot install globalcaos/jarvis-voiceRemember JARVIS in the Iron Man films? Not just the voice โ the _personality_. The bone-dry observations while Tony was mid-crisis. _"I do appreciate your concern, sir, but the suit is quite capable ofโ" [explosion] "โas I was saying."_ That effortless, understated humor that made you forget you were listening to software.
That's what this skill gives your OpenClaw agent. The voice โ offline text-to-speech using sherpa-onnx (British Alan voice) with metallic audio processing via ffmpeg. And the humor โ four research-backed comedy patterns (dry wit, self-aware AI, alien observer, literal idiom play) calibrated to make your agent sound like it's been running your life for years and is quietly amused by the experience.
The humor isn't bolted on. It's baked in. Because a JARVIS that speaks without wit is just Siri with better reverb.
๐ The research behind the humor: LIMBIC โ Computational Humor via Bisociation & Embedding Distances
tts toolThe built-in tts tool uses Edge TTS (cloud, wrong voice, no effects). Always use the jarvis shell command instead.
Every response that warrants voice output must include BOTH:
exec(command='jarvis "Your spoken text here."', background=true)
This fires immediately โ the user hears the voice BEFORE the text appears on screen.
**Jarvis:** *Your spoken text here.*
The webchat UI has custom CSS + JS that automatically detects Jarvis: and renders the following text in purple italic (.jarvis-voice class, color #9b59b6). You just write the markdown โ the styling is automatic.
This is called hybrid output: the user hears the voice first, then sees the transcript.
Note: The server-sidetriggerJarvisAutoTtshook is DISABLED (no-op). It fired too late (after text render). Voice comes exclusively from theexeccall.
jarvis "Hello, this is a test"
en_GB-alan-medium)--vits-length-scale=0.5)aplay to default audio device, then cleans up temp filesThe OpenClaw webchat has built-in support for Jarvis voice transcripts:
ui/src/styles/chat/text.css โ .jarvis-voice class renders purple italic (#9b59b6 dark, #8e44ad light theme)ui/src/ui/markdown.ts โ Post-render hook auto-wraps text after Jarvis: in a elementThis means you just write Jarvis: text in markdown and the webchat handles the purple rendering. No extra markup needed.
For non-webchat surfaces (WhatsApp, Telegram, etc.), the bold/italic markdown renders natively โ no purple, but still visually distinct.
Requires:
sherpa-onnx runtime at ~/.openclaw/tools/sherpa-onnx-tts/~/.openclaw/tools/sherpa-onnx-tts/models/vits-piper-en_GB-alan-medium/ffmpeg installed system-wideaplay (ALSA) for audio playbackjarvis script at ~/.local/bin/jarvis (or in PATH)jarvis script#!/bin/bash
# Jarvis TTS - authentic JARVIS-style voice
# Usage: jarvis "Hello, this is a test"
export LD_LIBRARY_PATH=$HOME/.openclaw/tools/sherpa-onnx-tts/lib:$LD_LIBRARY_PATH
RAW_WAV="/tmp/jarvis_raw.wav"
FINAL_WAV="/tmp/jarvis_final.wav"
# Generate speech
$HOME/.openclaw/tools/sherpa-onnx-tts/bin/sherpa-onnx-offline-tts \
--vits-model=$HOME/.openclaw/tools/sherpa-onnx-tts/models/vits-piper-en_GB-alan-medium/en_GB-alan-medium.onnx \
--vits-tokens=$HOME/.openclaw/tools/sherpa-onnx-tts/models/vits-piper-en_GB-alan-medium/tokens.txt \
--vits-data-dir=$HOME/.openclaw/tools/sherpa-onnx-tts/models/vits-piper-en_GB-alan-medium/espeak-ng-data \
--vits-length-scale=0.5 \
--output-filename="$RAW_WAV" \
"$@" >/dev/null 2>&1
# Apply JARVIS metallic processing
if [ -f "$RAW_WAV" ]; then
ffmpeg -y -i "$RAW_WAV" \
-af "asetrate=22050*1.05,aresample=22050,\
flanger=delay=0:depth=2:regen=50:width=71:speed=0.5,\
aecho=0.8:0.88:15:0.5,\
highpass=f=200,\
treble=g=6" \
"$FINAL_WAV" -v error
if [ -f "$FINAL_WAV" ]; then
aplay -D plughw:0,0 -q "$FINAL_WAV"
rm "$RAW_WAV" "$FINAL_WAV"
fi
fi
For WhatsApp, output must be OGG/Opus format instead of speaker playback:
sherpa-onnx-offline-tts --vits-length-scale=0.5 --output-filename=raw.wav "text"
ffmpeg -i raw.wav \
-af "asetrate=22050*1.05,aresample=22050,flanger=delay=0:depth=2:regen=50:width=71:speed=0.5,aecho=0.8:0.88:15:0.5,highpass=f=200,treble=g=6" \
-c:a libopus -b:a 64k output.ogg
jarvis-voice gives your agent a voice. Pair it with ai-humor-ultimate and you give it a _soul_ โ dry wit, contextual humor, the kind of understated sarcasm that makes you smirk at your own terminal.
This pairing is part of a 12-skill cognitive architecture we've been building โ voice, humor, memory, reasoning, and more. Research papers included, because we're that kind of obsessive.
๐ Explore the full project: github.com/globalcaos/clawdbot-moltbot-openclaw
Clone it. Fork it. Break it. Make it yours.
For voice to work consistently across new sessions, copy the templates to your workspace root:
cp {baseDir}/templates/VOICE.md ~/.openclaw/workspace/VOICE.md
cp {baseDir}/templates/SESSION.md ~/.openclaw/workspace/SESSION.md
cp {baseDir}/templates/HUMOR.md ~/.openclaw/workspace/HUMOR.md
Both files are auto-loaded by OpenClaw's workspace injection. The agent will speak from the very first reply of every session.
| File | Purpose |
|------|---------|
| bin/jarvis | The TTS + effects script (portable, uses $SHERPA_ONNX_TTS_DIR) |
| templates/VOICE.md | Voice enforcement rules (copy to workspace root) |
| templates/SESSION.md | Session start with voice greeting (copy to workspace root) |
| templates/HUMOR.md | Humor config โ four patterns, frequency 1.0 (copy to workspace root) |
Generated Mar 1, 2026
Enhance customer support with a witty, British-accented AI that provides vocal responses to user inquiries. The voice adds personality to automated support, making interactions more engaging and memorable while delivering clear instructions or summaries.
Use Jarvis Voice to create an AI tutor that explains concepts in a humorous, engaging manner. The offline TTS ensures privacy for learning sessions, and the voice's wit keeps students entertained while reinforcing key points through spoken summaries.
Integrate Jarvis Voice into home automation systems for vocal feedback on device status or commands. The metallic audio effects and dry humor make the AI feel like a futuristic assistant, enhancing user experience in controlling lights, appliances, or security systems.
Implement Jarvis Voice in video games or interactive experiences as a non-player character that provides vocal commentary. The humor patterns and British accent add depth to storytelling, making the AI feel like a sarcastic ally or observer within the game world.
Deploy Jarvis Voice for medication or appointment reminders in healthcare apps. The voice's clarity and engaging personality improve patient adherence, with offline operation ensuring data privacy and reliability in sensitive environments.
Offer Jarvis Voice as a cloud or on-premise API for developers to integrate into their applications. Charge monthly or annual fees based on usage tiers, targeting businesses that want to add personality to their AI services without developing voice tech in-house.
Sell perpetual licenses for embedding Jarvis Voice into hardware devices like smart speakers or kiosks. This model appeals to manufacturers seeking a unique, offline voice personality, with revenue from upfront payments and optional support contracts.
Provide a basic version of Jarvis Voice for free with limited humor patterns or voice effects, then upsell to a premium tier with advanced customization, multiple voices, or higher character limits. This attracts hobbyists and small businesses before converting them to paying customers.
๐ฌ Integration Tip
Ensure all dependencies like ffmpeg and aplay are installed, and test the jarvis command in a sandbox environment before deployment to verify audio playback and humor calibration.
Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).
Local Voice Input/Output for Agents using the AI Voice Agent API.
ๆฌๅฐ็ๆ Telegram ่ฏญ้ณๆถๆฏ๏ผๆฏๆ่ชๅจๆธ ๆดใๅๆฎตไธไธดๆถๆไปถ็ฎก็ใ
Speak responses aloud on macOS using the built-in `say` command when user input indicates Voice Wake/voice recognition (for example, messages starting with "User talked via voice recognition on <device>").
ๅๆๅฎ Telegram ็พค็ปๅ้่ฏญ้ณๆถๆฏ
Generate Russian male voice audio using ComfyUI with Qwen3 TTS node and save as MP3 for voice messages.