speech-translationBuild, adapt, or run an audio-processing workflow that takes spoken audio, transcribes it with Whisper or faster-whisper, translates the transcript using the...
Install via ClawdBot CLI:
clawdbot install decin/speech-translationGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
http://127.0.0.1:8000/translateAudited Apr 16, 2026 · audit v1.0
Generated Apr 14, 2026
A company uses the skill to transcribe and translate customer voice inquiries in real-time during chat interactions, enabling support agents to respond with translated audio for non-native speakers. This reduces language barriers and improves service efficiency in global markets.
Language learning platforms integrate the skill to provide students with immediate transcription and translation of spoken exercises, followed by synthesized audio in the target language for pronunciation practice. It enhances interactive learning through conversational feedback.
Healthcare providers deploy the skill to transcribe patient voice notes, translate medical instructions into the patient's native language, and output audio for clear communication. This aids in reducing misunderstandings and improving patient adherence in multilingual settings.
Travel apps use the skill to process tourist voice queries, transcribe them, translate into local languages, and provide audio responses for directions or recommendations. It facilitates seamless communication for travelers in foreign countries.
Media companies apply the skill in a local pipeline mode to transcribe audio from videos, translate scripts, and generate dubbed audio tracks for international audiences. This streamlines content adaptation for global distribution.
Offer the skill as a cloud-based API service where businesses pay a monthly fee for access to voice translation features, including transcription, LLM-assisted translation, and TTS synthesis. Revenue is generated through tiered pricing based on usage volume and support levels.
Sell customized on-premise deployments of the skill to large organizations, such as healthcare or customer service firms, with integration into existing chat systems. Revenue comes from one-time licensing fees and ongoing maintenance contracts.
Integrate the skill into a free mobile app for basic voice translation, with premium features like faster processing, advanced language support, and ad-free usage available via in-app purchases. Revenue is driven by upgrades and advertising in the free version.
💬 Integration Tip
For chat-native mode, ensure seamless integration with existing chat platforms by using the OpenClaw tts tool for immediate audio replies; for local pipelines, prioritize mock testing to validate workflows before full deployment.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.