aliyun-ttsAlibaba Cloud Text-to-Speech synthesis service.
Install via ClawdBot CLI:
clawdbot install guang384/aliyun-ttsAlibaba Cloud Text-to-Speech synthesis service.
Set the following environment variables:
ALIYUN_APP_KEY - Application KeyALIYUN_ACCESS_KEY_ID - Access Key IDALIYUN_ACCESS_KEY_SECRET - Access Key Secret (sensitive)# Configure App Key
clawdbot skills config aliyun-tts ALIYUN_APP_KEY "your-app-key"
# Configure Access Key ID
clawdbot skills config aliyun-tts ALIYUN_ACCESS_KEY_ID "your-access-key-id"
# Configure Access Key Secret (sensitive)
clawdbot skills config aliyun-tts ALIYUN_ACCESS_KEY_SECRET "your-access-key-secret"
Edit ~/.clawdbot/clawdbot.json:
{
skills: {
entries: {
"aliyun-tts": {
env: {
ALIYUN_APP_KEY: "your-app-key",
ALIYUN_ACCESS_KEY_ID: "your-access-key-id",
ALIYUN_ACCESS_KEY_SECRET: "your-access-key-secret"
}
}
}
}
}
# Basic usage
{baseDir}/bin/aliyun-tts "Hello, this is Aliyun TTS"
# Specify output file
{baseDir}/bin/aliyun-tts -o /tmp/voice.mp3 "Hello"
# Specify voice
{baseDir}/bin/aliyun-tts -v siyue "Use siyue voice"
# Specify format and sample rate
{baseDir}/bin/aliyun-tts -f mp3 -r 16000 "Audio parameters"
| Flag | Description | Default |
|------|-------------|---------|
| -o, --output | Output file path | tts.mp3 |
| -v, --voice | Voice name | siyue |
| -f, --format | Audio format | mp3 |
| -r, --sample-rate | Sample rate | 16000 |
Common voices: siyue, xiaoxuan, xiaoyun, etc. See Alibaba Cloud documentation for the full list.
When a user requests a voice reply:
# Generate audio
{baseDir}/bin/aliyun-tts -o /tmp/voice-reply.mp3 "Your reply content"
# Include in your response:
# MEDIA:/tmp/voice-reply.mp3
Generated Mar 1, 2026
Automatically generate voiceovers for educational videos, tutorials, or language learning apps using Aliyun TTS. This reduces production costs and time compared to hiring voice actors, while supporting multiple languages and voices for diverse content.
Integrate TTS into IVR systems or chatbots to provide spoken responses for customer inquiries, such as order status updates or FAQs. This enhances user experience by offering audio feedback, especially in call centers or mobile apps.
Add text-to-speech functionality to mobile or web applications to assist visually impaired users by reading out text content like articles, notifications, or menus. This improves accessibility compliance and user engagement.
Create voiceovers for podcasts, audiobooks, or game dialogues using customizable voices and parameters. This allows for rapid prototyping and content scaling without the need for recording studios.
Embed TTS in smart home devices, such as speakers or assistants, to provide spoken alerts, weather updates, or reminders. This leverages cloud-based synthesis for real-time, low-latency audio generation.
Offer the TTS skill as part of a subscription-based platform where users pay monthly or annually for access to voice synthesis features. This model can include tiered pricing based on usage limits or advanced voices.
Charge customers based on the number of characters or audio minutes processed through the TTS service. This is ideal for scalable applications with variable demand, such as media companies or app developers.
License the TTS technology to other companies for integration into their own products, such as call center software or educational tools. This includes customization and support services for a flat fee or revenue share.
💬 Integration Tip
Ensure environment variables are securely stored and test voice outputs with different parameters to match specific use cases.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.