openclaw-voiceTranscribe audio to text and generate spoken AI responses using Whisper and ElevenLabs via CLI with transcript storage and search.
Install via ClawdBot CLI:
clawdbot install frank-bot07/openclaw-voiceGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Potentially destructive shell commands in tool definitions
exec(Calls external URL not in known-safe list
https://img.shields.io/badge/tests-10%20passing-brightgreenAI Analysis
The skill's primary risk is local shell command execution (sox/rec, ffplay) for audio processing, which could be exploited if inputs are not properly sanitized. No evidence of unauthorized data exfiltration, credential harvesting, or hidden instructions was found in the provided definition. External API usage (ElevenLabs, Whisper) appears consistent with the stated purpose of voice conversation AI.
Audited Apr 16, 2026 · audit v1.0
Generated Feb 26, 2026
Healthcare providers can use OpenClaw Voice to record patient interviews, automatically transcribe them, and store searchable transcripts for medical records. This eliminates manual note-taking during consultations while maintaining accurate documentation for billing and follow-up care.
Law firms can utilize the skill to record legal depositions, generate verbatim transcripts, and organize them in a searchable database. This provides paralegals with quick access to specific testimony segments while maintaining chain of custody through UUID tracking.
Researchers conducting qualitative studies can record participant interviews, automatically transcribe them, and store organized transcripts for analysis. The search functionality helps identify recurring themes across multiple interviews without manual transcription costs.
Small businesses can record customer support calls, transcribe conversations, and maintain searchable records of customer issues. This helps identify recurring problems and provides documentation for training new support staff.
Journalists can record interviews with sources, generate accurate transcripts, and organize them by story or source. The searchable database helps quickly locate specific quotes or information during article writing and fact-checking.
Offer the voice transcription service as a monthly subscription for professionals needing regular interview/documentation capabilities. Tier pricing based on storage limits and transcription minutes, with enterprise plans for larger organizations requiring extensive search functionality.
Provide custom integration services for businesses wanting to embed the voice transcription capabilities into their existing workflows. This includes database schema customization, industry-specific command enhancements, and training for staff on CLI operations.
Partner with existing transcription services to offer their clients enhanced search and organization capabilities. The skill becomes a value-add feature for premium clients who need structured, queryable interview databases beyond basic transcription.
💬 Integration Tip
Ensure system audio tools (sox/rec, ffplay) are properly installed and configured before deployment, as the skill relies on child_process calls rather than packaged audio libraries.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.