whatsapp-voice-chat-integration-open-sourceReal-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.
Install via ClawdBot CLI:
clawdbot install syedateebulislam/whatsapp-voice-chat-integration-open-sourceTurn WhatsApp voice messages into real-time conversations. This skill provides a complete pipeline: voice → transcription → intent detection → response generation → text-to-speech.
Perfect for:
pip install openai-whisper soundfile numpy
const { processVoiceNote } = require('./scripts/voice-processor');
const fs = require('fs');
// Read a voice message (OGG, WAV, MP3, etc.)
const buffer = fs.readFileSync('voice-message.ogg');
// Process it
const result = await processVoiceNote(buffer);
console.log(result);
// {
// status: 'success',
// response: "Current weather in Delhi is 19°C, haze. Humidity is 56%.",
// transcript: "What's the weather today?",
// intent: 'weather',
// language: 'en',
// timestamp: 1769860205186
// }
For automatic processing of incoming WhatsApp voice messages:
node scripts/voice-listener-daemon.js
This watches ~/.clawdbot/media/inbound/ every 5 seconds and processes new voice files.
Incoming Voice Message
↓
Transcribe (Whisper API)
↓
"What's the weather?"
↓
Detect Language & Intent
↓
Match against INTENTS
↓
Execute Handler
↓
Generate Response
↓
Convert to TTS
↓
Send back via WhatsApp
✅ Zero Setup Complexity - No FFmpeg, no complex dependencies. Uses soundfile + Whisper.
✅ Multi-Language - Automatic English/Hindi detection. Extend easily.
✅ Intent-Driven - Define custom intents with keywords and handlers.
✅ Real-Time Processing - 5-10 seconds per message (after first model load).
✅ Customizable - Add weather, status, commands, or anything else.
✅ Production Ready - Built from real usage in Clawdbot.
// User says: "What's the weather in Bangalore?"
// Response: "Current weather in Delhi is 19°C..."
// (Built-in intent, just enable it)
// User says: "Turn on the lights"
// Handler: Sends signal to smart home API
// Response: "Lights turned on"
// User says: "Add milk to shopping list"
// Handler: Adds to database
// Response: "Added milk to your list"
// User says: "Is the system running?"
// Handler: Checks system status
// Response: "All systems online"
Edit voice-processor.js:
const INTENTS = {
'shopping': {
keywords: ['shopping', 'list', 'buy', 'खरीद'],
handler: 'handleShopping'
}
};
const handlers = {
async handleShopping(language = 'en') {
return {
status: 'success',
response: language === 'en'
? "What would you like to add to your shopping list?"
: "आप अपनी शॉपिंग लिस्ट में क्या जोड़ना चाहते हैं?"
};
}
};
detectLanguage() for your language's Unicode:const urduChars = /[\u0600-\u06FF]/g; // Add this
return language === 'ur' ? 'Urdu response' : 'English response';
transcribe.py:result = model.transcribe(data, language="ur")
In transcribe.py:
model = whisper.load_model("tiny") # Fastest, 39MB
model = whisper.load_model("base") # Default, 140MB
model = whisper.load_model("small") # Better, 466MB
model = whisper.load_model("medium") # Good, 1.5GB
Scripts:
transcribe.py - Whisper transcription (Python)voice-processor.js - Core logic (intent parsing, handlers)voice-listener-daemon.js - Auto-listener watching for new messagesReferences:
SETUP.md - Installation and configurationAPI.md - Detailed function documentationIf running as a Clawdbot skill, hook into message events:
// In your Clawdbot handler
const { processVoiceNote } = require('skills/whatsapp-voice-talk/scripts/voice-processor');
message.on('voice', async (audioBuffer) => {
const result = await processVoiceNote(audioBuffer, message.from);
// Send response back
await message.reply(result.response);
// Or send as voice (requires TTS)
await sendVoiceMessage(result.response);
});
OGG (Opus), WAV, FLAC, MP3, CAF, AIFF, and more via libsndfile.
WhatsApp uses Opus-coded OGG by default — works out of the box.
"No module named 'whisper'"
pip install openai-whisper
"No module named 'soundfile'"
pip install soundfile
Voice messages not processing?
clawdbot status (is it running?)~/.clawdbot/media/inbound/ (files arriving?)node scripts/voice-listener-daemon.js (see logs)Slow transcription?
Use smaller model: whisper.load_model("base") or "tiny"
references/SETUP.md for detailed installation and configurationreferences/API.md for function signatures and examplesscripts/ for working codeMIT - Use freely, customize, contribute back!
Built for real-world use in Clawdbot. Battle-tested with multiple languages and use cases.
Generated Mar 1, 2026
Deploy on WhatsApp to handle voice inquiries in English and Hindi, transcribing queries, detecting intents like order status or FAQs, and responding with automated voice or text. Ideal for e-commerce or service industries to reduce call center load and support non-literate users.
Integrate with IoT systems to allow users to send voice commands via WhatsApp for controlling lights, appliances, or security devices. The skill transcribes commands, matches intents to device actions, and confirms execution with TTS responses, enabling hands-free home automation.
Use for healthcare providers to let patients book appointments, check medication schedules, or get health tips via WhatsApp voice messages. The skill processes voice notes, detects intents like 'schedule appointment', and sends reminders or confirmations, improving accessibility in rural areas.
Implement in educational apps to help students practice English or Hindi pronunciation by sending voice messages for transcription and feedback. The skill can detect language errors, provide corrections via TTS, and track progress, supporting remote learning environments.
Deploy for logistics companies to enable drivers or customers to check delivery status, report issues, or update routes using WhatsApp voice messages. The skill transcribes queries, matches intents to database queries, and sends real-time updates via voice or text.
Offer the skill as a cloud-based service where businesses pay a monthly fee per user or message volume to integrate WhatsApp voice automation. Revenue comes from tiered plans based on features like custom intents, language support, and API calls, targeting SMEs and enterprises.
License the skill as a customizable package for developers or agencies to embed in their own applications, charging a one-time fee or annual license. Revenue is generated through upfront payments and optional support contracts, appealing to tech startups and integrators.
Provide the skill via an API where clients pay per voice message processed, with volume discounts for large-scale deployments. Revenue accrues from transaction fees, ideal for large corporations or platforms needing scalable, on-demand voice processing without upfront costs.
💬 Integration Tip
Ensure Python and Node.js dependencies are installed, and test with sample audio files before deploying in production to verify intent matching and TTS output.
iMessage/SMS CLI for listing chats, history, watch, and sending.
Use when you need to control Discord from Clawdbot via the discord tool: send messages, react, post or upload stickers, upload emojis, run polls, manage threads/pins/search, fetch permissions or member/role/channel info, or handle moderation actions in Discord DMs or channels.
Use when you need to control Slack from Clawdbot via the slack tool, including reacting to messages or pinning/unpinning items in Slack channels or DMs.
Send WhatsApp messages to other people or search/sync WhatsApp history via the wacli CLI (not for normal user chats).
Build or update the BlueBubbles external channel plugin for Clawdbot (extension package, REST send/probe, webhook inbound).
OpenClaw skill for designing Telegram Bot API workflows and command-driven conversations using direct HTTPS requests (no SDKs).