speech-recognition通用语音识别 Skill。支持多种音频格式(ogg/mp3/wav/m4a),使用硅基流动 SenseVoice API 进行语音转文字。当用户发送语音消息、音频文件,或需要转录音频时触发。
Install via ClawdBot CLI:
clawdbot install demo112/speech-recognitionGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Sends data to undocumented external endpoint (potential exfiltration)
POST → https://api.siliconflow.cn/v1/audio/transcriptionsCalls external URL not in known-safe list
https://api.siliconflow.cn/v1/audio/transcriptionsAI Analysis
The skill's external API call to siliconflow.cn is explicitly documented for its stated purpose of speech recognition, with clear privacy disclosure that audio is uploaded to their servers. No hidden instructions, credential harvesting, or obfuscation are present, but the data transfer to a third-party service warrants user awareness.
Audited Apr 16, 2026 · audit v1.0
Generated Mar 20, 2026
Transcribes customer service audio calls into text for analysis and record-keeping. Enables automated logging of inquiries and complaints, improving response tracking and compliance.
Converts recorded meeting audio into written transcripts for documentation. Facilitates easy sharing of key points and action items among team members, enhancing productivity.
Transcribes lecture recordings or educational podcasts into text to support students with hearing impairments or those who prefer reading. Aids in creating study materials and subtitles.
Generates text transcripts from audio in videos or podcasts for creating subtitles or closed captions. Helps content creators reach wider audiences and comply with accessibility standards.
Transcribes audio recordings of legal depositions or interviews into accurate text records. Supports legal professionals in case preparation and evidence organization.
Offers the speech recognition API to developers on a pay-per-use or subscription basis. Charges based on audio duration or number of requests, providing scalable access for various applications.
Sells customized licenses to businesses for integrating the skill into internal systems like CRM or collaboration tools. Includes support, customization, and volume discounts for large-scale deployments.
Provides basic transcription for free with limited usage, while charging for advanced features like higher accuracy, faster processing, or bulk file handling. Targets individual users and small teams.
💬 Integration Tip
Ensure API key is securely stored and handle audio format conversions using FFmpeg for compatibility.
Scored Apr 16, 2026
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Speak responses aloud on macOS using the built-in `say` command when user input indicates Voice Wake/voice recognition (for example, messages starting with "User talked via voice recognition on <device>").
Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.