volcengine-sttTranscribe audio to text using Volcano Engine (Volcengine/ARK) speech-to-text APIs. Use when the user wants to replace Whisper/OpenAI STT with Volcengine, tr...
Install via ClawdBot CLI:
clawdbot install reed1898/volcengine-sttGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Accesses system directories or attempts privilege escalation
/proc/Calls external URL not in known-safe list
https://ark.cn-beijing.volces.com/api/v3}/audio/transcriptions`Audited Apr 17, 2026 · audit v1.0
Generated Mar 20, 2026
Users can transcribe voice messages from Telegram into text using Volcengine STT, replacing Whisper for cost or regional compliance. This is useful for archiving conversations or enabling text-based search in messaging apps.
Integrate this skill into Discord bots to automatically transcribe voice chats or audio uploads, enhancing accessibility and content moderation. It allows real-time text conversion for community servers.
Businesses can use Volcengine STT to transcribe customer support calls in multiple languages, with hints for accuracy. This aids in generating transcripts for analysis and improving service quality.
Educators and content creators can transcribe audio lectures or podcasts into text for subtitles, notes, or accessibility purposes. The language and prompt flags help handle specialized terminology.
Offer transcription services via API to other developers or businesses, charging per audio minute or API call. This leverages Volcengine's infrastructure for scalable, reliable STT without heavy upfront investment.
Build and sell pre-integrated solutions for platforms like Telegram or Discord, where users pay for seamless voice-to-text features. This adds value to existing apps with minimal setup required.
Use transcribed text from calls or media to provide analytics insights, such as sentiment analysis or keyword tracking, to businesses. This turns raw audio into actionable data for decision-making.
💬 Integration Tip
Store API keys securely in environment variables and use the --json flag for debugging API responses during integration.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.