local-whisperLocal speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.
Install via ClawdBot CLI:
clawdbot install araa47/local-whisperRequires:
Local speech-to-text using OpenAI's Whisper. Fully offline after initial model download.
# Basic
~/.clawdbot/skills/local-whisper/scripts/local-whisper audio.wav
# Better model
~/.clawdbot/skills/local-whisper/scripts/local-whisper audio.wav --model turbo
# With timestamps
~/.clawdbot/skills/local-whisper/scripts/local-whisper audio.wav --timestamps --json
| Model | Size | Notes |
|-------|------|-------|
| tiny | 39M | Fastest |
| base | 74M | Default |
| small | 244M | Good balance |
| turbo | 809M | Best speed/quality |
| large-v3 | 1.5GB | Maximum accuracy |
--model/-m — Model size (default: base)--language/-l — Language code (auto-detect if omitted)--timestamps/-t — Include word timestamps--json/-j — JSON output--quiet/-q — Suppress progressUses uv-managed venv at .venv/. To reinstall:
cd ~/.clawdbot/skills/local-whisper
uv venv .venv --python 3.12
uv pip install --python .venv/bin/python click openai-whisper torch --index-url https://download.pytorch.org/whl/cpu
Generated Feb 24, 2026
Journalists can use Local Whisper to transcribe interviews and press conferences offline, ensuring data privacy and avoiding cloud service costs. It supports multiple languages and timestamps for accurate quoting and editing.
Researchers in social sciences can transcribe qualitative interviews locally, handling sensitive data without internet dependency. The high-quality models like large-v3 provide accurate transcriptions for detailed analysis.
Content creators can generate subtitles and transcripts for podcasts or videos offline, improving accessibility and SEO. The turbo model offers a good balance of speed and quality for efficient workflow.
Legal professionals can transcribe depositions and meetings locally to maintain confidentiality and compliance. The JSON output with timestamps aids in creating precise legal records.
Healthcare providers can transcribe patient consultations offline, ensuring HIPAA compliance and privacy. The quiet mode allows for discreet use during sensitive discussions.
Offer a free basic version with tiny or base models, and charge for premium features like advanced models (e.g., turbo, large-v3), batch processing, or API integrations. Revenue comes from subscription plans for businesses needing high-volume transcription.
Sell licenses for on-premise deployment to organizations requiring offline, secure transcription, such as government or financial institutions. Include setup support, custom integrations, and maintenance contracts for recurring revenue.
Provide a software development kit (SDK) that allows developers to integrate Local Whisper into their applications, such as note-taking apps or video editors. Monetize through per-application licensing or usage-based fees for commercial use.
💬 Integration Tip
Ensure ffmpeg is installed for audio processing, and use the provided uv setup for easy environment management to avoid dependency issues.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.