modesty-sttTranscribe audio files using SkillBoss API Hub STT
Install via ClawdBot CLI:
clawdbot install modestyrichards/modesty-sttGrade Limited — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://api.heybossai.com/v1Audited Apr 22, 2026 · audit v1.0
Generated May 5, 2026
Transcribe audio recordings of meetings to generate written minutes. Helps teams capture discussions and action items without manual note-taking.
Automatically transcribe incoming voice messages from users in messaging apps like Telegram. Enables text-based analysis and responses in customer service or personal assistant bots.
Convert podcast audio files into text for show notes, search indexing, or accessibility. Allows content creators to repurpose audio into blog posts or social media snippets.
Transcribe customer support calls to log interactions and extract key information. Helps improve quality assurance and train support agents.
Offer transcription as a pay-per-minute service. Customers pay based on audio duration processed, suitable for sporadic or varying volumes.
Provide monthly subscription plans with a set number of hours included. Overages are billed at a lower per-minute rate, encouraging regular use.
Embed the transcription API into existing platforms (e.g., CRM, helpdesk). Charge a licensing fee per user or per integration.
💬 Integration Tip
Set the SKILLBOSS_API_KEY environment variable and call the transcribe script with the audio file path; output is plain text for easy parsing.
Scored Apr 22, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.