audio-transcribeAuto-transcribe voice messages locally using faster-whisper with selectable Whisper models, no API key required.
Install via ClawdBot CLI:
clawdbot install AKTheKnight/audio-transcribeAuto-transcribe voice messages using faster-whisper (local, no API key needed).
pip install faster-whisper
Models download automatically on first use.
python3 /root/clawd/skills/audio-transcribe/scripts/transcribe.py /path/to/audio.ogg
Edit transcribe.py and change:
model = WhisperModel('small', device='cpu', compute_type='int8') # Options: tiny, base, small, medium, large-v3
| Model | Size | VRAM/RAM | Speed | Use Case |
|-------|------|----------|-------|----------|
| tiny | 39 MB | ~1 GB | โกโกโก | Quick drafts |
| base | 74 MB | ~1 GB | โกโก | Basic accuracy |
| small | 244 MB | ~2 GB | โก | Recommended |
| medium | 769 MB | ~5 GB | ๐ข | Better accuracy |
| large-v3 | 1.5 GB | ~10 GB | ๐ข๐ข | Best accuracy |
Clawdbot auto-transcribes incoming voice messages when this skill is enabled.
scripts/transcribe.py โ Main transcription scriptSKILL.md โ This fileGenerated Feb 26, 2026
Journalists can quickly transcribe recorded interviews or field reports using this skill, enabling faster article drafting and fact-checking. It supports various audio formats and runs locally, ensuring data privacy for sensitive conversations.
Educators and students can transcribe lecture recordings to create accessible notes or subtitles for online courses. The local processing avoids API costs and handles multiple model sizes for balancing speed and accuracy.
Businesses can transcribe customer support calls to analyze common issues and improve service quality. The skill integrates with bots like Clawdbot for automated transcription of voice messages in real-time.
Podcast creators can use this skill to generate transcripts for episodes, enhancing accessibility and SEO. The recommended small model offers a good balance of accuracy and resource efficiency for regular use.
Law firms can transcribe audio recordings from depositions or meetings to create official documents. Local operation ensures confidentiality, and model options allow customization based on accuracy needs.
Offer basic transcription for free using the tiny or base model, then charge for higher accuracy with small or medium models. Integrate with platforms to provide automated transcription as a paid add-on.
License the skill to companies for internal use in customer service or training departments. Provide customization and support for integrating with existing voice message systems to streamline workflows.
Sell packaged versions of the skill to educational institutions for transcribing lectures and creating accessible content. Include training and updates as part of the package to ensure ease of use.
๐ฌ Integration Tip
Ensure the faster-whisper library is installed and test with sample audio files first; for Clawdbot integration, enable the skill to auto-transcribe incoming voice messages seamlessly.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.