mlx-whisperLocal speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).
Install via ClawdBot CLI:
clawdbot install Kevin37Li/mlx-whisperInstall mlx-whisper (pip):
Install mlx-whisper (pip)Requires:
Local speech-to-text using Apple MLX, optimized for Apple Silicon Macs.
mlx_whisper /path/to/audio.mp3 --model mlx-community/whisper-large-v3-turbo
# Transcribe to text file
mlx_whisper audio.m4a -f txt -o ./output
# Transcribe with language hint
mlx_whisper audio.mp3 --language en --model mlx-community/whisper-large-v3-turbo
# Generate subtitles (SRT)
mlx_whisper video.mp4 -f srt -o ./subs
# Translate to English
mlx_whisper foreign.mp3 --task translate
| Model | Size | Speed | Quality |
|-------|------|-------|---------|
| mlx-community/whisper-tiny | ~75MB | Fastest | Basic |
| mlx-community/whisper-base | ~140MB | Fast | Good |
| mlx-community/whisper-small | ~470MB | Medium | Better |
| mlx-community/whisper-medium | ~1.5GB | Slower | Great |
| mlx-community/whisper-large-v3 | ~3GB | Slowest | Best |
| mlx-community/whisper-large-v3-turbo | ~1.6GB | Fast | Excellent (Recommended) |
~/.cache/huggingface/mlx-community/whisper-tiny; use --model mlx-community/whisper-large-v3-turbo for best resultsGenerated Mar 1, 2026
Video creators and podcasters can use MLX Whisper to generate accurate subtitles or transcripts for their content, improving accessibility and SEO. It's ideal for local processing on Apple Silicon Macs without relying on cloud APIs, ensuring privacy and reducing costs.
Researchers and students can transcribe interviews, lectures, or field recordings for qualitative analysis. The local operation ensures data security for sensitive research, and the ability to handle multiple languages supports diverse academic projects.
Professionals can record and transcribe meetings, conferences, or client calls to create searchable archives and action items. This enhances productivity by automating note-taking and supports compliance with record-keeping requirements in regulated industries.
Organizations can use MLX Whisper to generate real-time or post-processed captions for videos, making content accessible to hearing-impaired audiences. It's cost-effective for small to medium-sized businesses that need reliable, offline transcription tools.
Companies expanding globally can transcribe and translate audio content into English or other languages for marketing, training, or customer support materials. This speeds up localization workflows while maintaining data privacy on local devices.
Offer a free basic version with limited features (e.g., tiny model) and charge for premium features like advanced models, batch processing, or API integrations. Target individual creators and small businesses, generating revenue through subscriptions or one-time purchases.
Develop customized packages for enterprises needing secure, high-volume transcription, such as legal firms or healthcare providers. Include support, training, and integration with existing systems, charging based on usage tiers or annual contracts.
Operate a service agency that uses MLX Whisper to provide transcription and translation services to clients. Charge per audio minute or project, leveraging the tool's efficiency to scale operations and offer competitive pricing in the market.
💬 Integration Tip
Integrate MLX Whisper into workflows by automating transcription via scripts and storing outputs in cloud storage like Google Drive or Dropbox for easy access.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.