siliconflow-tts-genText-to-Speech using SiliconFlow API (CosyVoice2). Supports multiple voices, languages, and dialects.
Install via ClawdBot CLI:
clawdbot install lilei0311/siliconflow-tts-genGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://github.com/maxstorm/siliconflow-tts-genAudited Apr 16, 2026 · audit v1.0
Generated Mar 1, 2026
This skill can be used to generate automated voice responses for customer service hotlines in multiple languages and dialects, such as Chinese, English, and Korean, reducing the need for human operators. It supports ultra-low latency for real-time interactions and can personalize voices to match brand tone.
Educators and e-learning platforms can use this skill to create audio versions of educational materials in various languages and dialects, making content accessible to diverse student populations. The voice cloning feature allows for rapid adaptation of familiar instructor voices across different regional accents.
Content creators can leverage this skill to produce audiobooks or podcasts with multiple character voices, enhancing storytelling with distinct male and female vocal profiles. The auto-download feature simplifies local file management for batch audio generation.
Developers can integrate this skill into applications to convert text to speech for visually impaired users, supporting multiple languages and adjustable speech speeds for personalized listening experiences. The low latency ensures responsive feedback in interactive apps.
Marketing teams can use this skill to generate voiceovers for advertisements in different regional dialects, such as Cantonese or Sichuan, to better target local audiences. The passionate and cheerful voice options help create engaging promotional content efficiently.
Offer this skill as a subscription-based service where users pay monthly or annually for access to the SiliconFlow TTS API through your platform. Revenue is generated from tiered pricing based on usage limits, such as number of audio generations or voice cloning requests.
License the skill to large enterprises for integration into their internal systems, such as call centers or training platforms, with custom branding and support. Revenue comes from one-time licensing fees or ongoing maintenance contracts tailored to enterprise needs.
Provide basic TTS functionality for free to attract a broad user base, then monetize through premium features like advanced voice cloning, higher speed limits, or additional language packs. Revenue is generated from upsells and in-app purchases for enhanced capabilities.
💬 Integration Tip
Ensure the SILICONFLOW_API_KEY environment variable is securely set before deployment, and consider using the optional config file for easier key management in automated workflows.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.