audioProcess, enhance, and convert audio files with noise removal, normalization, format conversion, transcription, and podcast workflows.
Install via ClawdBot CLI:
clawdbot install ivangdavila/audioRequires:
Grade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Sends data to undocumented external endpoint (potential exfiltration)
POST → https://api.assemblyai.com/v2/transcriptCalls external URL not in known-safe list
https://api.assemblyai.com/v2/transcriptAI Analysis
The skill definition explicitly states it does not access cloud services without user knowledge, but the rule-based signals found indicate a POST request to an external transcription API (AssemblyAI). This creates a conflict between documented scope and actual behavior, posing a privacy risk if user audio is sent without clear consent.
Audited Apr 16, 2026 · audit v1.0
Generated Mar 20, 2026
Podcasters can use this skill to normalize audio levels to platform standards like Spotify's -16 LUFS, remove background noise, and convert files to formats like MP3 or AAC for distribution. It supports workflow steps from raw recording to final export, ensuring professional sound quality.
Content creators can extract audio from videos, transcribe it into subtitles (SRT/VTT), and optimize audio quality for platforms like YouTube. This helps improve accessibility and engagement by providing clear, normalized audio and text transcripts.
Libraries, museums, or individuals can convert legacy audio formats to modern ones like FLAC for lossless archiving or MP3 for sharing. The skill handles format conversion while preserving quality, making it useful for digitization projects.
Musicians and producers can separate stems (e.g., vocals, drums) using Demucs, apply noise reduction, and normalize tracks for streaming platforms. This aids in remixing, mastering, and preparing music for distribution on services like Spotify.
Businesses can transcribe training videos or meetings into text for documentation and compliance, while also enhancing audio clarity with noise removal. This supports accessibility initiatives and improves content usability for employees.
Offer basic audio processing (e.g., format conversion, noise removal) for free, with premium features like advanced transcription, stem separation, or batch processing available via subscription. This attracts a broad user base while monetizing power users.
License the skill as a white-label solution for podcast networks, video production studios, or streaming platforms to integrate audio processing into their workflows. Provide API access and custom integrations for automated processing.
Deploy the skill as a cloud-based API where users pay per task, such as per minute of transcription or per file processed. This model suits occasional users or developers needing on-demand audio enhancement without upfront costs.
💬 Integration Tip
Ensure ffmpeg and ffprobe are installed on the system; for advanced features like transcription, consider integrating Whisper API or local setup for better performance.
Scored Apr 18, 2026
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Speak responses aloud on macOS using the built-in `say` command when user input indicates Voice Wake/voice recognition (for example, messages starting with "User talked via voice recognition on <device>").
Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.