audio-reply-skillGenerate audio replies using TTS. Trigger with "read it to me [public URL]" to fetch and read content aloud, or "talk to me [topic]" to generate a spoken res...
Install via ClawdBot CLI:
clawdbot install MaTriXy/audio-reply-skillGenerate spoken audio responses using MLX Audio TTS (chatterbox-turbo model).
http:// or https:// URLs.localhost, *.local127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, ::1, fc00::/7)User: read it to me https://example.com/article
User: talk to me about the weather today
uv run mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "Your text here" \
--play \
--file_prefix /tmp/audio_reply
--model mlx-community/chatterbox-turbo-fp16 - Fast, natural voice--play - Auto-play the generated audio--file_prefix - Save to temp location for cleanup--exaggeration 0.3 - Optional: add expressiveness (0.0-1.0)--speed 1.0 - Adjust speech rate if neededFor "read it to me" mode:
For "talk to me" mode:
Always delete temporary files after playback. Generated audio or referenced text may be retained by the chat client history, so avoid processing sensitive sources.
# Generate with unique filename and play
OUTPUT_FILE="/tmp/audio_reply_$(date +%s)"
uv run mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "Your response text" \
--play \
--file_prefix "$OUTPUT_FILE"
# ALWAYS clean up after playing
rm -f "${OUTPUT_FILE}"*.wav 2>/dev/null
If TTS fails:
uv is installed and in PATHUser: read it to me https://blog.example.com/new-feature
Assistant actions:
1. Validate URL against Safety Guardrails, then WebFetch the URL
2. Extract article content
3. Generate TTS:
uv run mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "Here's what I found... [article summary]" \
--play --file_prefix /tmp/audio_reply_1706123456
4. Delete: rm -f /tmp/audio_reply_1706123456*.wav
5. Confirm: "Done reading the article to you."
User: talk to me about what you can help with
Assistant actions:
1. Generate conversational response text
2. Generate TTS:
uv run mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "Hey! So I can help you with all kinds of things..." \
--play --file_prefix /tmp/audio_reply_1706123789
3. Delete: rm -f /tmp/audio_reply_1706123789*.wav
4. (No text output needed - audio IS the response)
--play flag uses system audio - ensure volume is upGenerated Mar 1, 2026
This skill enables visually impaired individuals to have web content read aloud by simply providing a public URL, enhancing digital accessibility. It can be integrated into assistive technology platforms to provide real-time audio summaries of articles, news, or documentation, supporting independent information consumption.
Businesses can use this skill to generate conversational audio replies for customer inquiries, such as explaining product features or answering FAQs, providing a more engaging and personal touch. It reduces reliance on pre-recorded messages by dynamically generating natural-sounding responses based on user topics.
Educators and e-learning platforms can leverage this skill to convert online educational materials, like blog posts or tutorials, into audio format for students who prefer auditory learning. It allows for quick summarization and narration of public resources, making study sessions more flexible and accessible.
Media companies can integrate this skill to offer audio versions of news articles or reports, enabling users to listen to updates hands-free while commuting or multitasking. It fetches public URLs, extracts key content, and delivers concise spoken summaries, expanding audience reach through audio formats.
Developers can embed this skill into smart home devices or IoT applications to provide voice-based interactions, such as reading weather updates or answering general knowledge questions. It uses TTS to generate natural responses, enhancing user experience in environments where screen interaction is limited.
Offer a monthly subscription for individuals or organizations needing enhanced audio access to web content, with tiered plans based on usage limits or premium features like faster processing. Revenue comes from recurring fees, targeting educational institutions, libraries, and corporate accessibility programs.
License the TTS functionality as an API for developers to integrate into their applications, charging per request or through monthly API usage tiers. This model generates revenue from tech companies building voice-enabled apps, customer service bots, or educational tools that require audio output.
Provide a free basic version for personal use with limited features, and offer premium upgrades for businesses, such as advanced summarization, multiple language support, or ad-free audio. Revenue is generated through in-app purchases, ads in free tiers, and enterprise partnerships.
💬 Integration Tip
Ensure the system has uv installed and sufficient storage for the TTS model; prioritize public URLs and implement strict safety checks to avoid fetching sensitive data.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.