voice-clone-tts声纹克隆和语音合成。上传音频样本克隆声纹,用克隆声纹或预设声纹生成语音。支持多个后端:MiniMax、ElevenLabs、Fish Audio、Azure TTS、OpenAI TTS。支持情绪控制、语速调整、批量生成。触发词:语音合成、TTS、声纹克隆、voice clone、text to speech、配...
Install via ClawdBot CLI:
clawdbot install oliviapp8/voice-clone-ttsGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Sends data to undocumented external endpoint (potential exfiltration)
POST → https://api.minimax.chat/v1/text_to_speechCalls external URL not in known-safe list
https://www.minimax.chat/Uses known external API (expected, informational)
api.openai.comAI Analysis
The skill's external API calls (MiniMax, ElevenLabs, etc.) are consistent with its stated purpose of voice cloning and TTS. While it sends user audio/text data to third-party services, this is expected functionality, not unauthorized exfiltration. No hidden instructions, credential harvesting, or obfuscation were found in the provided definition.
Generated May 8, 2026
Create video content with a consistent digital human avatar and a custom voice cloned from a short audio sample. The voice is synthesized per scene and synchronized with the avatar's mouth movements, enabling personalized spokesperson videos without reliance on platform-specific voice cloning.
Convert written content such as books, articles, or scripts into natural-sounding audio using cloned or preset voices. Supports batch generation with emotion and speed control for engaging listening experiences.
Generate voiceovers in multiple languages by using backends like ElevenLabs or Azure TTS. Cloned voices can be used across languages, enabling consistent brand voice for global audiences.
Integrate with chatbots or voice assistants to provide a unique, branded voice for responses. Clone a voice for personalized interaction or use preset voices for different personas.
Automate the dubbing of video scenes by processing a script with scene-by-scene narration, emotions, and speeds. Produces a set of audio files ready for video editing or direct synchronization.
Offer a monthly subscription granting access to voice cloning, TTS synthesis, and batch generation with a limited number of characters or minutes. Premium tiers add advanced emotions, higher quality, and more backends.
Provide API access for high-volume voice synthesis and cloning, integrating into existing production pipelines for dubbing studios, eLearning platforms, or video production houses.
License the voice cloning and TTS technology to digital human platforms that lack native voice cloning. The technology is integrated as a backend module, enabling the platform to offer custom voices to their users.
💬 Integration Tip
Automate the entire workflow by connecting the video-script-generator output to this skill for scenes and pipe the generated audio into digital-avatar or video-stitcher for seamless production.
Scored May 8, 2026
Audited Apr 16, 2026 · audit v1.0
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Any-to-any AI sub-agent — research, images, video, audio, music, podcasts, avatars, voice cloning, documents, spreadsheets, dashboards, 3D models, diagrams,...
Speak responses aloud on macOS using the built-in `say` command when user input indicates Voice Wake/voice recognition (for example, messages starting with "User talked via voice recognition on <device>").
High-quality voice synthesis with 18 personas, 32 languages, sound effects, batch processing, and voice design using ElevenLabs API.