mlx-sttSpeech-To-Text with MLX (Apple Silicon) and opensource models (default GLM-ASR-Nano-2512) locally.
Install via ClawdBot CLI:
clawdbot install guoqiao/mlx-sttGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://github.com/guoqiao/skills/blob/main/mlx-stt/mlx-stt/SKILL.mdAudited Apr 17, 2026 · audit v1.0
Generated Mar 1, 2026
Researchers can transcribe interviews, lectures, or focus group recordings locally without relying on cloud services. This ensures data privacy and eliminates API costs for qualitative analysis.
Podcast creators on Apple Silicon Macs can generate accurate transcripts for episodes to create show notes, subtitles, or searchable content. Local processing avoids upload delays and subscription fees.
Journalists using MacBooks can quickly transcribe field interviews or press conferences offline, even without internet access. The tool supports various audio formats via ffmpeg conversion.
Content creators can generate captions for videos or audio content to meet accessibility standards. Running locally on macOS ensures fast processing without sharing sensitive media files externally.
Legal professionals can transcribe client meetings or deposition recordings privately on their Macs. Local execution maintains confidentiality and avoids third-party data handling risks.
Offer a free basic version with limited features, then charge for advanced capabilities like batch processing, custom vocabulary, or premium model support. Target individual creators and small teams.
Sell licenses to organizations requiring fully offline, secure transcription for sensitive audio data. Emphasize compliance with data protection regulations and Apple Silicon optimization.
Package the skill as a SDK or API for integration into other macOS applications. Charge developers for usage tiers, support, and customization services.
💬 Integration Tip
Ensure brew, ffmpeg, and uv are installed first; handle initial model download delays in your application flow.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.