siliconflow-tts-genText-to-Speech using SiliconFlow API (CosyVoice2). Supports multiple voices, languages, and dialects.
Install via ClawdBot CLI:
clawdbot install lilei0311/siliconflow-tts-genText-to-Speech using SiliconFlow API with CosyVoice2 model. Supports 8 preset voices, multiple languages, and Chinese dialects.
SILICONFLOW_API_KEY~/.openclaw/openclaw.json (for auto-detect)npx clawhub install siliconflow-tts-gen
Set your SiliconFlow API key:
export SILICONFLOW_API_KEY="your-api-key"
python3 scripts/generate.py --list-voices
# Basic usage (default voice: alex)
python3 scripts/generate.py "你好,世界"
# Specify voice
python3 scripts/generate.py "Hello World" --voice bella
# Adjust speed
python3 scripts/generate.py "你好" --voice claire --speed 0.9
# Save to file
python3 scripts/generate.py "欢迎收听" --output welcome.mp3
# Change format
python3 scripts/generate.py "Hello" --format wav
| ID | Name | Characteristic |
|----|------|----------------|
| alex | 沉稳男声 | Mature and steady |
| benjamin | 低沉男声 | Deep and low |
| charles | 磁性男声 | Magnetic |
| david | 欢快男声 | Cheerful |
| ID | Name | Characteristic |
|----|------|----------------|
| anna | 沉稳女声 | Mature and elegant |
| bella | 激情女声 | Passionate |
| claire | 温柔女声 | Gentle and kind |
| diana | 欢快女声 | Sweet and happy |
| Parameter | Type | Default | Range | Description |
|-----------|------|---------|-------|-------------|
| --voice | string | alex | - | Voice ID |
| --speed | float | 1.0 | 0.25-4.0 | Speech speed |
| --format | string | mp3 | mp3/opus/wav/pcm | Output format |
| --output | string | output.mp3 | - | Output file path |
~/.openclaw/openclaw.json only to auto-detect API keysapi.siliconflow.cnscripts/generate.py before providing credentialsMaxStorm Team
MIT
Generated Mar 1, 2026
This skill can be used to generate automated voice responses for customer service hotlines in multiple languages and dialects, such as Chinese, English, and Korean, reducing the need for human operators. It supports ultra-low latency for real-time interactions and can personalize voices to match brand tone.
Educators and e-learning platforms can use this skill to create audio versions of educational materials in various languages and dialects, making content accessible to diverse student populations. The voice cloning feature allows for rapid adaptation of familiar instructor voices across different regional accents.
Content creators can leverage this skill to produce audiobooks or podcasts with multiple character voices, enhancing storytelling with distinct male and female vocal profiles. The auto-download feature simplifies local file management for batch audio generation.
Developers can integrate this skill into applications to convert text to speech for visually impaired users, supporting multiple languages and adjustable speech speeds for personalized listening experiences. The low latency ensures responsive feedback in interactive apps.
Marketing teams can use this skill to generate voiceovers for advertisements in different regional dialects, such as Cantonese or Sichuan, to better target local audiences. The passionate and cheerful voice options help create engaging promotional content efficiently.
Offer this skill as a subscription-based service where users pay monthly or annually for access to the SiliconFlow TTS API through your platform. Revenue is generated from tiered pricing based on usage limits, such as number of audio generations or voice cloning requests.
License the skill to large enterprises for integration into their internal systems, such as call centers or training platforms, with custom branding and support. Revenue comes from one-time licensing fees or ongoing maintenance contracts tailored to enterprise needs.
Provide basic TTS functionality for free to attract a broad user base, then monetize through premium features like advanced voice cloning, higher speed limits, or additional language packs. Revenue is generated from upsells and in-app purchases for enhanced capabilities.
💬 Integration Tip
Ensure the SILICONFLOW_API_KEY environment variable is securely set before deployment, and consider using the optional config file for easier key management in automated workflows.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.