OpenAI Whisper (Local): Private, Offline Speech-to-Text — No API Key Required
13,400+ downloads and 70 stars — the Whisper (Local) Skill by @steipete brings OpenAI's industry-leading speech recognition to your Clawdbot workflow with zero cloud dependency and zero API key. Install the openai-whisper brew package, and every audio file on your machine becomes instantly transcribable.
Note: This is the local Whisper skill, which runs the model entirely on your device. If you need cloud-based transcription with the latest Whisper API and GPT-4o audio models, see the separate openai-whisper-api skill.
The Problem It Solves
Audio transcription is one of those tasks that seems simple until you hit the privacy wall. Uploading your meeting recordings, legal depositions, or research interviews to a cloud API means trusting a third party with potentially sensitive content. Local Whisper solves this completely — your audio never leaves your machine. And for heavy users, the cost math is simple: $0 per transcription, forever, regardless of volume.
How It Works
OpenAI released Whisper as an open-source model that runs on consumer hardware. The whisper CLI (installed via Homebrew's openai-whisper formula) loads the model locally and processes audio files entirely in your CPU/GPU. The Clawdbot skill provides the context layer so your agent knows how to invoke it correctly.
The skill defaults to the turbo model — OpenAI's fastest accurate option — but you can choose from the full model ladder:
| Model | Size | Speed | Best For |
|---|---|---|---|
| tiny | ~39MB | Fastest | Quick drafts, real-time |
| base | ~74MB | Very fast | Short clips |
| small | ~244MB | Fast | General use |
| medium | ~769MB | Moderate | High accuracy |
| large | ~1.5GB | Slow | Maximum accuracy |
| turbo | ~809MB | Fast | Default — best balance |
Models download automatically to ~/.cache/whisper on first use and are cached for subsequent runs.
Installation
# Install the Whisper CLI via Homebrew (macOS)
brew install openai-whisper
# Verify installation
whisper --helpClawdbot automatically handles installation via the skill's install manifest when you first use it.
Core Usage
Basic Transcription
# Transcribe to text file (simplest)
whisper /path/to/audio.mp3 --model turbo --output_format txt --output_dir .
# Transcribe to multiple formats at once
whisper /path/to/recording.m4a --model medium --output_format all --output_dir ./transcriptsTranslation to English
# Translate non-English audio directly to English text
whisper /path/to/french-meeting.mp3 --task translate --output_format srtOutput Formats
| Format | Flag | Use Case |
|---|---|---|
txt | --output_format txt | Plain text, copy-paste ready |
srt | --output_format srt | Video subtitles |
vtt | --output_format vtt | Web video subtitles |
tsv | --output_format tsv | Spreadsheet import |
json | --output_format json | Programmatic processing |
all | --output_format all | All formats at once |
Model Selection for Performance
# For speed (laptops, quick drafts)
whisper audio.mp3 --model tiny
# For accuracy (important recordings)
whisper audio.mp3 --model large
# Specify language explicitly for better accuracy
whisper audio.mp3 --model medium --language JapanesePractical Workflows
Podcast episode transcription:
whisper episode-47.mp3 --model medium --output_format txt --output_dir ./show-notesBatch transcription (multiple files):
whisper *.mp3 --model turbo --output_format srt --output_dir ./subtitlesLegal/medical transcription (maximum accuracy, keep local):
whisper deposition.m4a --model large --language en --output_format txt --output_dir .How Clawdbot Uses It
With this skill installed, you can ask Clawdbot naturally:
- "Transcribe the meeting recording at ~/Downloads/standup.mp3"
- "Translate this French audio file to English text"
- "Create SRT subtitles for my video using the medium model"
Clawdbot handles the command construction and file path resolution automatically.
Choosing Between Local Whisper Skills
ClawHub has three local Whisper skills — here's how they compare:
| openai-whisper (this) | faster-whisper | local-whisper (araa47) | |
|---|---|---|---|
| Backend | OpenAI Whisper CLI (PyTorch) | CTranslate2 | Whisper via uv + Python venv |
| API key required | No | No | No |
| Speed | Moderate | 4–6x faster (CPU), ~20x (GPU) | Similar to this skill |
| Speaker diarization | No | Yes | No |
| Word-level timestamps | No | Yes | Yes |
| YouTube URL input | No | Yes | No |
| Output formats | txt/srt/vtt/tsv/json | SRT/VTT/ASS/LRC + more | JSON + quiet mode |
| Setup | brew install | More complex | uv based |
Recommendation:
- This skill — simplest path: one
brew install, done. Good for most users. - faster-whisper — power users who need speed, multi-speaker recordings, or batch processing.
- local-whisper — if you prefer
uv/venv isolation and need JSON output or quiet mode.
Local vs. API: Cost and Privacy
| Local (this skill) | API (openai-whisper-api) | |
|---|---|---|
| Cost | Free forever | $0.006/min |
| 10-hour archive | $0 | $3.60 |
| Privacy | 100% on-device | Audio sent to OpenAI |
| Speed | Depends on hardware | Fast, consistent |
| Latest models | Whisper only | GPT-4o transcribe, diarization |
| Internet required | No | Yes |
| Setup | brew install | API key required |
Use local if: you handle sensitive audio, transcribe in bulk, or want zero ongoing cost. Use API if: you need speaker diarization, the very latest accuracy, or have limited compute.
Considerations
- macOS only — the Homebrew install path is macOS-specific; Linux users can install
openai-whispervia pip - First run is slow — model download happens on first use; large models take several minutes on slow connections
- CPU-heavy — the
largemodel on CPU can be very slow; M-series Macs handle it well via Metal GPU acceleration - English bias — Whisper is trained heavily on English; accuracy varies by language, especially for low-resource languages
- No speaker diarization — local Whisper doesn't identify who said what; for multi-speaker recordings, the API version or separate tools are needed
The Bigger Picture
The openai-whisper local skill represents a broader trend: open-weight models making cloud services optional for an entire category of AI tasks. For speech recognition, OpenAI's decision to release Whisper publicly means teams handling HIPAA data, legal recordings, or proprietary business discussions can get state-of-the-art transcription without any compliance risk. Steipete — who also authored the popular apple-reminders skill — packaged this into a seamless Clawdbot integration that takes seconds to install.
View the skill on ClawHub: openai-whisper