speech-to-text-transcriptionTranscribe audio and video files to text with speaker detection, timestamps, and format conversion.
Install via ClawdBot CLI:
clawdbot install ivangdavila/speech-to-text-transcriptionRequires:
Grade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Accesses sensitive credential files or environment variables
$OPENAICalls external URL not in known-safe list
https://clawic.com/skills/speech-to-text-transcriptionUses known external API (expected, informational)
api.openai.comAI Analysis
The skill's external API usage (OpenAI, AssemblyAI, Deepgram) is consistent with its stated transcription purpose, and it explicitly recommends local processing with Whisper for privacy. The primary risk is potential credential access via environment variables, but this is a standard pattern for optional cloud services.
Audited Apr 16, 2026 · audit v1.0
Generated Mar 21, 2026
Transcribes university lectures from video recordings into text with timestamps, enabling students to create searchable notes and study materials. Supports long durations and speaker diarization to distinguish between professor and student interactions.
Converts podcast audio files into transcripts for subtitles, show notes, and content repurposing. Uses speaker detection to label hosts and guests, and outputs formats like SRT for video platforms.
Transcribes business meetings and interviews, extracting action items and summaries for team collaboration. Handles multi-speaker content with diarization and ensures privacy by using local processing for sensitive discussions.
Transcribes voice memos from healthcare professionals into structured text for patient records. Requires high accuracy and can use local Whisper to maintain data privacy and compliance with regulations.
Transcribes audio recordings of legal depositions with precise timestamps and speaker identification for court documentation. Supports batch processing of long files and outputs in JSON for easy integration with case management systems.
Offers basic transcription with local Whisper for free, while charging for premium features like cloud provider integrations (e.g., OpenAI Whisper API for higher accuracy) and advanced diarization. Revenue comes from subscription tiers based on usage limits and support.
Licenses the skill package to businesses for internal use, such as in corporate training or media production. Includes custom integrations, priority support, and volume discounts for large-scale transcription needs.
Operates as a transcription service agency using this skill to process client audio files efficiently. Charges per minute of audio transcribed, with added fees for rush jobs, multiple output formats, and data preprocessing.
💬 Integration Tip
Ensure ffmpeg is installed and configured for audio preprocessing, and set up environment variables for cloud API keys only when needed to avoid unnecessary data exposure.
Scored Apr 18, 2026
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Speak responses aloud on macOS using the built-in `say` command when user input indicates Voice Wake/voice recognition (for example, messages starting with "User talked via voice recognition on <device>").
Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.