Logo
ClawHub Skills Lib
HomeCategoriesUse CasesTrendingBlog
HomeCategoriesUse CasesTrendingBlog
ClawHub Skills Lib
ClawHub Skills Lib

Browse 25,000+ community-built AI agent skills for OpenClaw. Updated daily from clawhub.ai.

Explore

  • Home
  • Trending
  • Use Cases
  • Blog

Categories

  • Development
  • AI & Agents
  • Productivity
  • Communication
  • Data & Research
  • Business
  • Platforms
  • Lifestyle
  • Education
  • Design

Use Cases

  • Security Auditing
  • Workflow Automation
  • Finance & Fintech
  • MCP Integration
  • Crypto Trading
  • Web3 & DeFi
  • Data Analysis
  • Social Media
  • 中文平台技能
  • All Use Cases →
© 2026 ClawHub Skills Lib. All rights reserved.Built with Next.js · Supabase · Prisma
Home/Blog/OpenAI Whisper (Local): Private, Offline Speech-to-Text — No API Key Required
skill-spotlightspeech-audioopenai-whisperclawhubopenclaw

OpenAI Whisper (Local): Private, Offline Speech-to-Text — No API Key Required

March 15, 2026·6 min read

13,400+ downloads and 70 stars — the Whisper (Local) Skill by @steipete brings OpenAI's industry-leading speech recognition to your Clawdbot workflow with zero cloud dependency and zero API key. Install the openai-whisper brew package, and every audio file on your machine becomes instantly transcribable.

Note: This is the local Whisper skill, which runs the model entirely on your device. If you need cloud-based transcription with the latest Whisper API and GPT-4o audio models, see the separate openai-whisper-api skill.

The Problem It Solves

Audio transcription is one of those tasks that seems simple until you hit the privacy wall. Uploading your meeting recordings, legal depositions, or research interviews to a cloud API means trusting a third party with potentially sensitive content. Local Whisper solves this completely — your audio never leaves your machine. And for heavy users, the cost math is simple: $0 per transcription, forever, regardless of volume.

How It Works

OpenAI released Whisper as an open-source model that runs on consumer hardware. The whisper CLI (installed via Homebrew's openai-whisper formula) loads the model locally and processes audio files entirely in your CPU/GPU. The Clawdbot skill provides the context layer so your agent knows how to invoke it correctly.

The skill defaults to the turbo model — OpenAI's fastest accurate option — but you can choose from the full model ladder:

ModelSizeSpeedBest For
tiny~39MBFastestQuick drafts, real-time
base~74MBVery fastShort clips
small~244MBFastGeneral use
medium~769MBModerateHigh accuracy
large~1.5GBSlowMaximum accuracy
turbo~809MBFastDefault — best balance

Models download automatically to ~/.cache/whisper on first use and are cached for subsequent runs.

Installation

# Install the Whisper CLI via Homebrew (macOS)
brew install openai-whisper
 
# Verify installation
whisper --help

Clawdbot automatically handles installation via the skill's install manifest when you first use it.

Core Usage

Basic Transcription

# Transcribe to text file (simplest)
whisper /path/to/audio.mp3 --model turbo --output_format txt --output_dir .
 
# Transcribe to multiple formats at once
whisper /path/to/recording.m4a --model medium --output_format all --output_dir ./transcripts

Translation to English

# Translate non-English audio directly to English text
whisper /path/to/french-meeting.mp3 --task translate --output_format srt

Output Formats

FormatFlagUse Case
txt--output_format txtPlain text, copy-paste ready
srt--output_format srtVideo subtitles
vtt--output_format vttWeb video subtitles
tsv--output_format tsvSpreadsheet import
json--output_format jsonProgrammatic processing
all--output_format allAll formats at once

Model Selection for Performance

# For speed (laptops, quick drafts)
whisper audio.mp3 --model tiny
 
# For accuracy (important recordings)
whisper audio.mp3 --model large
 
# Specify language explicitly for better accuracy
whisper audio.mp3 --model medium --language Japanese

Practical Workflows

Podcast episode transcription:

whisper episode-47.mp3 --model medium --output_format txt --output_dir ./show-notes

Batch transcription (multiple files):

whisper *.mp3 --model turbo --output_format srt --output_dir ./subtitles

Legal/medical transcription (maximum accuracy, keep local):

whisper deposition.m4a --model large --language en --output_format txt --output_dir .

How Clawdbot Uses It

With this skill installed, you can ask Clawdbot naturally:

  • "Transcribe the meeting recording at ~/Downloads/standup.mp3"
  • "Translate this French audio file to English text"
  • "Create SRT subtitles for my video using the medium model"

Clawdbot handles the command construction and file path resolution automatically.

Choosing Between Local Whisper Skills

ClawHub has three local Whisper skills — here's how they compare:

openai-whisper (this)faster-whisperlocal-whisper (araa47)
BackendOpenAI Whisper CLI (PyTorch)CTranslate2Whisper via uv + Python venv
API key requiredNoNoNo
SpeedModerate4–6x faster (CPU), ~20x (GPU)Similar to this skill
Speaker diarizationNoYesNo
Word-level timestampsNoYesYes
YouTube URL inputNoYesNo
Output formatstxt/srt/vtt/tsv/jsonSRT/VTT/ASS/LRC + moreJSON + quiet mode
Setupbrew installMore complexuv based

Recommendation:

  • This skill — simplest path: one brew install, done. Good for most users.
  • faster-whisper — power users who need speed, multi-speaker recordings, or batch processing.
  • local-whisper — if you prefer uv/venv isolation and need JSON output or quiet mode.

Local vs. API: Cost and Privacy

Local (this skill)API (openai-whisper-api)
CostFree forever$0.006/min
10-hour archive$0$3.60
Privacy100% on-deviceAudio sent to OpenAI
SpeedDepends on hardwareFast, consistent
Latest modelsWhisper onlyGPT-4o transcribe, diarization
Internet requiredNoYes
Setupbrew installAPI key required

Use local if: you handle sensitive audio, transcribe in bulk, or want zero ongoing cost. Use API if: you need speaker diarization, the very latest accuracy, or have limited compute.

Considerations

  • macOS only — the Homebrew install path is macOS-specific; Linux users can install openai-whisper via pip
  • First run is slow — model download happens on first use; large models take several minutes on slow connections
  • CPU-heavy — the large model on CPU can be very slow; M-series Macs handle it well via Metal GPU acceleration
  • English bias — Whisper is trained heavily on English; accuracy varies by language, especially for low-resource languages
  • No speaker diarization — local Whisper doesn't identify who said what; for multi-speaker recordings, the API version or separate tools are needed

The Bigger Picture

The openai-whisper local skill represents a broader trend: open-weight models making cloud services optional for an entire category of AI tasks. For speech recognition, OpenAI's decision to release Whisper publicly means teams handling HIPAA data, legal recordings, or proprietary business discussions can get state-of-the-art transcription without any compliance risk. Steipete — who also authored the popular apple-reminders skill — packaged this into a seamless Clawdbot integration that takes seconds to install.


View the skill on ClawHub: openai-whisper

← Back to Blog