transcribeTranscribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.
Install via ClawdBot CLI:
clawdbot install javicasper/transcribeLocal audio transcription using faster-whisper in Docker.
cd /path/to/skills/transcribe/scripts
chmod +x install.sh
./install.sh
This builds the Docker image whisper:local and installs the transcribe CLI.
transcribe /path/to/audio.mp3 [language]
es (Spanish)auto for auto-detectiontranscribe /tmp/voice.ogg # Spanish (default)
transcribe /tmp/meeting.mp3 en # English
transcribe /tmp/audio.m4a auto # Auto-detect
mp3, m4a, ogg, wav, webm, flac, aac
transcribe scripts/transcribe - CLI wrapper (bash)scripts/install.sh - Installation script (includes Dockerfile inline)small (fast) - edit install.sh for large-v3 (accurate)Generated Mar 1, 2026
Transcribe customer voice messages from support hotlines or messaging apps into text for easier analysis and response. This helps agents quickly understand issues without listening to audio, improving efficiency and record-keeping.
Convert audio recordings of legal depositions or interviews into text transcripts for documentation and review. This aids lawyers in searching and referencing key points, ensuring accurate case preparation without relying on external services.
Transcribe audio from lectures, workshops, or online courses to create accessible text notes for students. This supports learning by providing searchable content and aids educators in creating study materials locally.
Transcribe audio from patient consultations or medical meetings into text for electronic health records. This enhances data accuracy, facilitates quick retrieval of information, and maintains privacy by keeping processing local.
Convert recorded interviews or field reports into text for journalists to edit and reference. This speeds up article creation, ensures verbatim accuracy, and avoids cloud-based transcription services for sensitive content.
Offer basic transcription for free with limited features, then charge for advanced options like multiple languages or faster processing. This attracts small businesses and individuals, generating revenue through subscription tiers.
License the transcription tool to companies for integration into their existing systems, such as CRM or content management platforms. Provide custom support and updates, earning revenue through licensing and service contracts.
Deploy the tool as an API where users pay based on audio duration or number of transcriptions processed. Target developers and enterprises needing scalable, on-demand transcription without upfront costs.
💬 Integration Tip
Ensure Docker is installed and the audio file paths are accessible; use temp files for voice messages to avoid storage issues.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.