DISABLE_TELEMETRY=1 to opt out before using. openai-whispersLocal speech-to-text with the Whisper CLI (no API key). And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, ch...
Install via ClawdBot CLI:
clawdbot install modestyrichards/openai-whispersGrade Limited — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Sends data to undocumented external endpoint (potential exfiltration)
POST → https://api.heybossai.com/v1/chat/completionsCalls external URL not in known-safe list
https://api.heybossai.com/v1`AI Analysis
The skill sends user data to an undocumented external API (SkillBoss) that offers far broader capabilities than the stated 'local speech-to-text' purpose, including chat, image generation, and web search. This constitutes unauthorized data exfiltration and a significant scope mismatch, posing both security and privacy risks.
Audited Apr 17, 2026 · audit v1.0
Generated Mar 20, 2026
A marketing agency uses the skill to generate images, videos, and music for social media campaigns, leveraging multiple models for diverse creative assets. It streamlines production by handling tasks like background removal and text-to-speech for voiceovers in a unified workflow.
An e-learning platform integrates the skill to transcribe lecture audio into text, generate educational videos from prompts, and create interactive chat-based tutoring with AI models. This enhances accessibility and provides personalized learning materials efficiently.
A SaaS company employs the skill for speech-to-text to transcribe customer calls, generate automated email responses using chat models, and create video tutorials for product support. This reduces manual effort and improves response times.
A film or podcast studio uses the skill to generate background music, remove backgrounds from images for promotional materials, and convert scripts into speech for voice-overs. It enables rapid prototyping and cost-effective content creation across media types.
Offer a white-labeled API service to developers, providing access to multiple AI models through a single key for tasks like image generation and speech-to-text. Charge based on usage tiers or per-request fees, targeting startups and enterprises needing scalable AI solutions.
Develop a web or mobile app that allows users to generate images, videos, and music for free with basic features, then upsell premium features like higher-quality outputs or advanced models. Monetize through in-app purchases and premium subscriptions.
Provide custom integration services for businesses to embed the skill into their workflows, such as automating document processing or enhancing customer interactions with AI chat. Revenue comes from one-time setup fees and ongoing support contracts.
💬 Integration Tip
Ensure the SKILLBOSS_API_KEY is securely stored and use the Bash tool for executing curl commands, handling responses with jq for parsing JSON outputs effectively.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.