qwen-audioHigh-performance audio library with text-to-speech (TTS) and speech-to-text (STT).
Install via ClawdBot CLI:
clawdbot install DarkNoah/qwen-audioGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wavAudited Apr 17, 2026 · audit v1.0
Generated Mar 21, 2026
Enables rapid creation of audiobooks or podcast episodes using high-quality TTS with customizable voices. Publishers can clone narrator voices for consistency across series and generate transcripts via STT for accessibility or marketing materials.
Integrates TTS to provide natural-sounding voice responses in IVR systems or chatbots, with voice cloning for brand-specific tones. STT transcribes customer calls for analysis, improving service quality and compliance.
Facilitates the development of interactive e-learning content by converting text lessons into speech with engaging, instructor-like voices. STT can transcribe student audio submissions for feedback or assessment.
Supports game developers and animators in generating character dialogues quickly using TTS with emotion control via instruct parameters. Voice cloning allows for unique character voices without hiring multiple actors.
Assists healthcare providers by transcribing patient consultations via STT for accurate medical records. TTS can convert written instructions into speech for patients with visual impairments, using calm, professional voices.
Offer the TTS and STT capabilities as cloud-based APIs with tiered pricing based on usage volume. Target businesses needing scalable audio processing, such as call centers or content creators, with premium features like advanced voice cloning.
License the skill as a customizable, on-premise solution for large organizations in industries like finance or healthcare. Provide integration support and maintenance contracts, ensuring data privacy and compliance with industry regulations.
Develop a user-friendly web or desktop application with free basic TTS/STT features and paid upgrades for high-quality voices, batch processing, or commercial use. Monetize through in-app purchases or premium subscriptions.
💬 Integration Tip
Ensure Python 3.10+ is installed and environment checks are completed before deployment; use the voice pre-check workflow to manage voice profiles efficiently.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Start voice calls via the OpenClaw voice-call plugin.
Local text-to-speech via sherpa-onnx (offline, no cloud)