qwen-ttsLocal text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.
Install via ClawdBot CLI:
clawdbot install paki81/qwen-ttsGrade Good — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://hf-mirror.comAudited Apr 16, 2026 · audit v1.0
Generated Feb 24, 2026
Content creators and marketers can generate voiceovers for videos, podcasts, or social media in multiple languages without relying on cloud services. This is ideal for producing Italian or other language content with emotion control for engaging storytelling.
Developers can integrate this TTS into applications to provide text-to-speech features for visually impaired users or language learners. The offline capability ensures privacy and reliability in educational or assistive technology tools.
Businesses can use this skill to generate automated voice responses or interactive voice systems in customer support, with support for 10 languages and customizable tones. It offers a cost-effective alternative to cloud-based TTS for localized service.
Individuals or small teams can create personalized voice messages for communication apps or notifications in different languages, leveraging the premium speaker voices and instruction-based emotion control for expressive audio.
AI researchers and hobbyists can quickly prototype TTS functionalities in projects like chatbots or virtual assistants, using the local model to avoid API costs and latency issues during development phases.
Offer a basic version of this TTS skill for free in open-source projects or tools, with premium features like additional speaker voices or advanced emotion controls available via subscription. This attracts users while generating recurring revenue from power users.
License the TTS technology to companies for internal use in applications like training modules or automated systems, with custom support and integration services. This model leverages the offline and multilingual capabilities for secure, scalable solutions.
Create a platform where users can generate and sell voiceovers or audio content using this skill, taking a commission on transactions. This taps into the growing demand for localized and emotive audio in media production.
💬 Integration Tip
Use the script's stdout output path for seamless integration with workflows like OpenClaw, ensuring audio files are captured automatically for further processing.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.