zai-ttsText-to-speech conversion using GLM-TTS service via the `uvx zai-tts` command for generating audio from text. Use when (1) User requests audio/voice output w...
Install via ClawdBot CLI:
clawdbot install al-one/zai-ttsGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://github.com/aahl/zai-ttsAudited Apr 16, 2026 · audit v1.0
Generated Mar 21, 2026
This skill can convert text content into audio, enabling visually impaired users to access written information through speech. It supports custom voice settings and speed adjustments to enhance listening comfort, making digital content more inclusive.
Creators can use this skill to generate high-quality voiceovers from scripts, streamlining the production of podcasts or audiobooks. By leveraging pre-cloned voices and adjustable parameters, it reduces recording time and costs while maintaining professional audio output.
Users can convert text-based instructions or articles into audio while driving, cooking, or exercising, allowing hands-free consumption of information. The skill's speed and volume controls help tailor the audio to different environments for better focus and safety.
Businesses can integrate this skill to create spoken versions of training materials, enhancing employee engagement through auditory learning. It supports multiple voice options to match different content tones, making educational resources more dynamic and accessible.
Companies can automate voice responses for customer inquiries by converting text replies into audio, improving service efficiency. The skill allows customization of voice characteristics to align with brand identity, providing a personalized touch in automated interactions.
Offer a platform where users pay a monthly fee to access premium voice options, higher audio quality, or increased usage limits for text-to-speech conversions. This model can target content creators and businesses needing regular audio output, generating recurring revenue.
License the skill's underlying technology as an API for developers to integrate into their applications, charging based on the number of audio generations or characters processed. This approach caters to tech companies seeking scalable TTS solutions without upfront development costs.
Provide basic text-to-speech functionality for free to attract a broad user base, while monetizing advanced features like custom voice cloning, faster processing, or ad-free experiences. This model encourages user adoption and upsells to premium tiers for enhanced capabilities.
💬 Integration Tip
Ensure environment variables ZAI_AUDIO_USERID and ZAI_AUDIO_TOKEN are properly configured before use, and consider automating voice selection based on content type for smoother integration.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.