vocal-chatHandles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.
Install via ClawdBot CLI:
clawdbot install rubenfb23/vocal-chatThis skill automates the voice-to-voice loop on WhatsApp using local transcription and local TTS.
tools/transcribe_voice.sh to get the text.bin/sherpa-onnx-tts..ogg file back to the user as a voice note.To respond with voice manually:
bin/sherpa-onnx-tts /tmp/reply.ogg "Tu mensaje aquÃ"
Then send /tmp/reply.ogg via message tool with filePath.
Generated Feb 23, 2026
Enables businesses to handle customer inquiries via voice on WhatsApp, automatically transcribing audio queries and responding with synthesized voice. Useful for support teams in retail or telecom to reduce typing and improve accessibility for users preferring verbal communication.
Facilitates interactive voice conversations for language practice, where learners send audio messages and receive spoken responses. Helps in pronunciation and listening skills, ideal for edtech platforms integrating WhatsApp for immersive learning experiences.
Automates voice-based reminders for medical appointments via WhatsApp, transcribing patient confirmations and sending audio notifications. Enhances engagement in healthcare settings, especially for elderly or visually impaired patients who find voice more convenient.
Supports voice communication for field technicians using WhatsApp to report issues and receive instructions hands-free. Transcribes audio updates and provides spoken guidance, improving efficiency in industries like utilities or logistics.
Offer the skill as a subscription service for businesses to integrate voice chat on WhatsApp, charging monthly fees based on usage volume. Revenue comes from tiered plans for small to large enterprises needing automated voice support.
Monetize by providing API access to the transcription and TTS tools, charging per audio message processed. Targets developers and companies looking to add voice features without building infrastructure, generating revenue from transaction-based pricing.
License the skill as a customizable white-label product for resellers or large organizations to brand as their own. Revenue is generated through one-time licensing fees or annual contracts, appealing to telecoms or app developers.
💬 Integration Tip
Ensure local tools like ffmpeg and whisper-cpp are properly installed and configured for fast audio processing to meet RTF constraints.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.