mlx-sttSpeech-To-Text with MLX (Apple Silicon) and opensource models (default GLM-ASR-Nano-2512) locally.
Install via ClawdBot CLI:
clawdbot install guoqiao/mlx-sttSpeech-To-Text/ASR/Transcribe with MLX (Apple Silicon) and opensource models (default GLM-ASR-Nano-2512) locally.
Free and Accurate. No api key required. No server required.
mlx: macOS with Apple Siliconbrew: used to install deps if not availablebash ${baseDir}/install.sh
This script will use brew to install these cli tools if not available:
ffmpeg: convert audio format when neededuv: install python package and run python scriptmlx_audio: do the real jobTo transcribe an audio file, run this script:
bash ${baseDir}/mlx-stt.sh <audio_file_path>
Generated Mar 1, 2026
Researchers can transcribe interviews, lectures, or focus group recordings locally without relying on cloud services. This ensures data privacy and eliminates API costs for qualitative analysis.
Podcast creators on Apple Silicon Macs can generate accurate transcripts for episodes to create show notes, subtitles, or searchable content. Local processing avoids upload delays and subscription fees.
Journalists using MacBooks can quickly transcribe field interviews or press conferences offline, even without internet access. The tool supports various audio formats via ffmpeg conversion.
Content creators can generate captions for videos or audio content to meet accessibility standards. Running locally on macOS ensures fast processing without sharing sensitive media files externally.
Legal professionals can transcribe client meetings or deposition recordings privately on their Macs. Local execution maintains confidentiality and avoids third-party data handling risks.
Offer a free basic version with limited features, then charge for advanced capabilities like batch processing, custom vocabulary, or premium model support. Target individual creators and small teams.
Sell licenses to organizations requiring fully offline, secure transcription for sensitive audio data. Emphasize compliance with data protection regulations and Apple Silicon optimization.
Package the skill as a SDK or API for integration into other macOS applications. Charge developers for usage tiers, support, and customization services.
💬 Integration Tip
Ensure brew, ffmpeg, and uv are installed first; handle initial model download delays in your application flow.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.