mm-easy-voiceSimple text-to-speech skill using MiniMax Voice API. Converts text to audio with customizable voice selection. Use for generating speech audio from text.
Install via ClawdBot CLI:
clawdbot install blue-coconut/mm-easy-voiceGrade Limited — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Sends data to undocumented external endpoint (potential exfiltration)
Upload → https://platform.minimaxi.com/docs/api-reference/voice-cloning-uploadcloneaudioCalls external URL not in known-safe list
https://api.minimaxi.com/v1Audited Apr 17, 2026 · audit v1.0
Generated Mar 22, 2026
Instructors and course developers can use this skill to generate voiceovers for educational videos and interactive modules. It supports multiple languages and customizable voices, making it ideal for creating engaging audio content without hiring voice actors.
Publishers and independent authors can convert book manuscripts into audio files efficiently. The skill handles long texts by splitting requests and allows emotion matching, enabling high-quality narration for fiction and non-fiction titles.
Businesses can integrate this skill into IVR systems or chatbots to provide natural-sounding voice responses. Customizable voices and pause insertion help create clear, professional audio prompts for customer interactions.
Marketing teams can generate voiceovers for promotional videos, social media ads, and podcasts. The voice cloning and design features allow brands to create unique, consistent audio identities across campaigns.
Developers can build applications that convert text to speech for visually impaired users or language learners. The skill's simple API and support for multiple voices facilitate integration into assistive technologies and educational apps.
Offer a cloud-based platform where users pay a monthly fee for access to the text-to-speech API with advanced features like voice cloning and high usage limits. This model provides recurring revenue and scales with customer demand.
Charge customers based on the number of characters processed or audio minutes generated. This model appeals to businesses with variable usage needs, such as startups or seasonal campaigns, and can be integrated into existing billing systems.
License the skill to enterprises or resellers who rebrand it as part of their own products, such as e-learning platforms or call center software. This model generates upfront licensing fees and ongoing support contracts.
💬 Integration Tip
Ensure the MINIMAX_VOICE_API_KEY environment variable is set before running scripts, and use the check_environment.py tool to verify dependencies like FFmpeg for audio processing.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Start voice calls via the OpenClaw voice-call plugin.
Local text-to-speech via sherpa-onnx (offline, no cloud)