alicloud-ai-audio-ttsGenerate human-like speech audio with Model Studio DashScope Qwen TTS models (qwen3-tts-flash, qwen3-tts-instruct-flash). Use when converting text to speech,...
Install via ClawdBot CLI:
clawdbot install cinience/alicloud-ai-audio-ttsCategory: provider
Use one of the recommended models:
qwen3-tts-flashqwen3-tts-instruct-flashqwen3-tts-instruct-flash-2026-01-26python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope
DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials (env takes precedence).text (string, required)voice (string, required)language_type (string, optional; default Auto)instruction (string, optional; recommended for instruct models)stream (bool, optional; default false)audio_url (string, when stream=false)audio_base64_pcm (string, when stream=true)sample_rate (int, 24000)format (string, wav or pcm depending on mode)import os
import dashscope
# Prefer env var for auth: export DASHSCOPE_API_KEY=...
# Or use ~/.alibabacloud/credentials with dashscope_api_key under [default].
# Beijing region; for Singapore use: https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
text = "Hello, this is a short voice line."
response = dashscope.MultiModalConversation.call(
model="qwen3-tts-instruct-flash",
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice="Cherry",
language_type="English",
instruction="Warm and calm tone, slightly slower pace.",
stream=False,
)
audio_url = response.output.audio.url
print(audio_url)
stream=True returns Base64-encoded PCM chunks at 24kHz.finish_reason == "stop" when the stream ends.language_type consistent with the text to improve pronunciation.instruction only when you need explicit style/tone control.(text, voice, language_type) to avoid repeat costs.output/ai-audio-tts/audio/OUTPUT_DIR.references/api_reference.md for parameter mapping and streaming example.skills/ai/audio/alicloud-ai-audio-tts-realtime/.skills/ai/audio/alicloud-ai-audio-tts-voice-clone/ and skills/ai/audio/alicloud-ai-audio-tts-voice-design/.references/sources.mdGenerated Mar 1, 2026
Generate voiceovers for social media videos like TikTok or Instagram Reels, where quick, engaging narration is needed. Useful for creators producing drama skits, news recaps, or educational snippets without recording equipment.
Convert written course materials or training scripts into audio for online learning platforms. Helps create accessible content for visual learners or multilingual audiences by adjusting language and tone.
Produce pre-recorded audio responses for IVR systems or chatbots to enhance user interactions. Can be used for announcements, instructions, or feedback prompts in call centers.
Automate the generation of audio summaries from text reports or logs, such as converting daily news articles or technical documentation into spoken format for hands-free consumption.
Create audio versions of websites, books, or documents to assist individuals with visual impairments. Supports multiple languages and customizable voices for better user experience.
Offer a cloud-based platform where users pay a monthly fee to access TTS generation with advanced features like custom voices and high-volume usage. Targets influencers, marketers, and small businesses needing regular audio content.
License the TTS technology to large companies for integration into their internal systems, such as e-learning platforms or customer service tools. Includes support and customization based on usage volume.
Provide basic TTS generation for free to attract individual users, then upsell premium features like faster processing, exclusive voices, or advanced streaming capabilities. Focuses on building a user base and converting to paid plans.
đŹ Integration Tip
Ensure the DASHSCOPE_API_KEY is set in environment variables for seamless authentication, and cache audio outputs to reduce API costs and latency.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.