mm-voice-makerEnables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creat...
Install via ClawdBot CLI:
clawdbot install blue-coconut/mm-voice-makerGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Sends data to undocumented external endpoint (potential exfiltration)
Upload → https://platform.minimaxi.com/docs/api-reference/voice-cloning-uploadcloneaudioCalls external URL not in known-safe list
https://api.minimaxi.com/v1AI Analysis
The skill sends user data to a documented third-party API (MiniMax) for legitimate voice synthesis functions, which is consistent with its stated purpose. No credential harvesting, hidden instructions, or obfuscation were found in the provided definition. The risk is low as the external endpoint is a known commercial service for the skill's core functionality.
Audited Apr 18, 2026 · audit v1.0
Generated Mar 22, 2026
Publishers and authors can convert written novels into audiobooks with distinct voices for narrator and characters, enhancing listener immersion. The segment-based TTS supports multi-voice narration, allowing for dynamic dialogue and emotional expression.
E-learning platforms and educators can generate voiceovers for online courses, tutorials, and language learning materials. Multi-voice capabilities enable clear differentiation between instructors, students, and example dialogues, improving engagement.
Media companies and independent creators can automate the production of podcasts or interview summaries by synthesizing host and guest voices. This reduces recording time and costs while maintaining a natural, conversational flow.
Businesses can create professional voiceovers for internal training videos, corporate announcements, and safety protocols. The skill ensures consistent, clear narration with options for formal or neutral tones suitable for professional environments.
Organizations can convert text-based content like websites, documents, or public announcements into speech for visually impaired users. Voice cloning allows for personalized, familiar voices, enhancing accessibility and user experience.
Offer tiered subscription plans for developers and businesses to access the MiniMax Voice API through this skill, with limits on usage, voice cloning, and premium features. Revenue is generated from monthly or annual fees based on usage tiers.
Charge users per minute of generated audio, with additional fees for advanced features like voice cloning or high-quality emotions. This model suits occasional users or small projects, providing flexibility without long-term commitments.
License the skill as a customizable white-label solution for large companies in media, education, or corporate sectors, integrating it into their existing platforms. Revenue comes from one-time licensing fees and ongoing support contracts.
💬 Integration Tip
Ensure the MINIMAX_VOICE_API_KEY is set in the environment and verify FFmpeg installation for audio processing to avoid common errors during TTS generation.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Start voice calls via the OpenClaw voice-call plugin.
Local text-to-speech via sherpa-onnx (offline, no cloud)