zhipu-ttsText-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chin...
Install via ClawdBot CLI:
clawdbot install franklu0819-lang/zhipu-ttsConvert Chinese text to natural-sounding speech using Zhipu AI's GLM-TTS model.
1. Get your API Key:
Get a key from Zhipu AI Console
2. Set it in your environment:
export ZHIPU_API_KEY="your-key-here"
Convert text to speech with default settings (tongtong voice, normal speed, WAV format):
bash scripts/text_to_speech.sh "你好,今天天气怎么样"
Specify voice, speed, format, and output filename:
bash scripts/text_to_speech.sh "欢迎使用智能语音服务" xiaochen 1.2 wav greeting.wav
Parameters:
text (required): Chinese text to convert (max 1024 characters)voice (optional): tongtong (default), chuichui, xiaochen, jam, kazi, douji, luodospeed (optional): Speech speed from 0.5 to 2.0 (default: 1.0)output_format (optional): wav (default), pcmoutput_file (optional): Output filename (default: output.{format})Choose tongtong (default) for:
Choose chuichui for:
Choose xiaochen for:
Choose jam/kazi/douji/luodo for:
Recommended speeds:
WAV (recommended):
PCM:
Create a professional greeting:
bash scripts/text_to_speech.sh "您好,感谢致电智能客服,请按1选择中文服务" tongtong 1.0 wav greeting.wav
Generate an energetic announcement:
bash scripts/text_to_speech.sh "热烈欢迎各位嘉宾参加今天的活动!" xiaochen 1.3 wav announcement.wav
Create a calm narration:
bash scripts/text_to_speech.sh "在这个宁静的夜晚,让我们一起欣赏美丽的星空" chuichui 0.9 wav narration.wav
Best practices:
Sample rate: Generated audio uses 24000 Hz sampling rate for optimal quality.
Text Length Issues:
Audio Quality Issues:
File Playback Issues:
Generated Mar 1, 2026
Automates voice responses for IVR systems or chatbots, providing natural-sounding Chinese greetings and menu options. Reduces human agent workload and ensures consistent, professional tone across interactions.
Generates audio narrations for e-learning modules, language courses, or audiobooks in Chinese. Allows customization with different voices and speeds to match educational tones, enhancing accessibility and engagement.
Creates character voices for animations, podcasts, or video games using specialized voices like jam or luodo. Supports creative projects with varied personas and speeds for dynamic audio content.
Produces audio for corporate presentations, event announcements, or training materials with voices like tongtong or chuichui. Ensures clear, authoritative delivery suitable for business environments.
Converts written Chinese text into speech for visually impaired users or applications requiring audio output. Provides options for slower speeds and clear voices to improve comprehension and usability.
Offers pay-per-use or subscription access to the TTS API for developers and businesses. Generates revenue through usage tiers, with potential for premium features like higher limits or custom voices.
Licenses the TTS technology to other companies for integration into their products, such as call centers or educational platforms. Provides customization and support services for a recurring fee.
Operates a platform where users can generate and sell audio content, such as narrations or voice-overs, using the TTS tool. Takes a commission on sales or offers premium generation credits.
💬 Integration Tip
Ensure ZHIPU_API_KEY is set in the environment and install required tools like jq for smooth script execution.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.