modesty-asrFast, accurate automatic speech-to-text transcription supporting 100 languages from URLs or local files via SkillBoss API Hub.
Install via ClawdBot CLI:
clawdbot install modestyrichards/modesty-asrGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://heybossai.comAudited Apr 18, 2026 · audit v1.0
Generated May 5, 2026
Transcribe customer support calls in real-time to analyze agent performance and customer sentiment. Supports 100 languages for global call centers.
Convert doctor's voice notes into structured text for electronic health records, reducing manual entry errors. Works with local files or URLs from recording devices.
Automatically generate text transcripts of podcast episodes or video content for SEO and accessibility. Upload media files directly for batch processing.
Transcribe team meetings from recorded URLs to create searchable meeting notes. Use language hints to improve accuracy for multilingual teams.
Provide text transcripts of language lessons for learners to read along and improve comprehension. Auto-detect language for diverse educational content.
Charge customers per minute of audio transcribed, with tiered plans for high-volume users. Integrate billing via SkillBoss API usage tracking.
Offer a free tier with a monthly minute cap to attract users, then upsell to paid plans for additional minutes or advanced features like language detection.
License the skill to large organizations needing dedicated transcription capacity, custom language models, or on-premise deployment.
💬 Integration Tip
Ensure the SKILLBOSS_API_KEY environment variable is set before calling the skill; use the --language flag for better accuracy on non-English audio.
Scored May 5, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.