feishu-whisper-voice利用 Faster-Whisper 高精度语音识别与飞书内置 TTS,实现语音消息识别和双向语音交流回复。
Install via ClawdBot CLI:
clawdbot install 15071664/feishu-whisper-voiceGrade Limited — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://download.pytorch.org/whl/cu118Audited Apr 17, 2026 · audit v1.0
Generated Mar 18, 2026
Integrate into customer service platforms to automatically transcribe user voice inquiries and generate voice responses, reducing wait times and improving accessibility for non-text users. Ideal for handling high-volume support calls in industries like e-commerce or telecommunications.
Use in educational apps to provide real-time speech recognition for language practice, allowing students to speak and receive instant feedback via voice. Enhances interactive learning experiences in online tutoring or language courses.
Deploy in healthcare settings to transcribe patient-doctor conversations into text records, improving accuracy and efficiency in medical documentation. Can be integrated with EHR systems for seamless data entry.
Implement in collaboration tools like Feishu to transcribe meeting audio into searchable text notes, facilitating better information retention and follow-up actions for remote or hybrid teams.
Embed in apps to convert text content into speech and vice versa, enabling visually impaired users to interact with digital platforms through voice commands and audio feedback.
Offer the skill as a cloud-based service with tiered pricing based on usage volume, such as number of transcriptions or voice interactions per month. Targets businesses needing scalable voice AI solutions without infrastructure management.
Sell custom licenses to large organizations for on-premise or private cloud deployment, including dedicated support and customization. Suitable for industries with strict data privacy requirements like finance or healthcare.
Provide API access where clients pay per transcription or TTS request, allowing developers to integrate voice features into their apps flexibly. Appeals to startups and small businesses with variable usage needs.
💬 Integration Tip
Start with the base Whisper model for CPU to minimize setup complexity, and ensure FFmpeg is installed for optimal audio processing.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Start voice calls via the OpenClaw voice-call plugin.
Local text-to-speech via sherpa-onnx (offline, no cloud)