shumianyu将分段 asr 文本流在结束时统一整理成流畅书面语,并在用户指定 target_language 时输出译文。适用于语音听写、会议转写、采访整理、播客或视频字幕后处理等场景,尤其适合不需要实时字幕、只需要在整段结束后得到最终书面结果的任务。保持接口抽象、可替换,不绑定任何特定厂商 api。
Install via ClawdBot CLI:
clawdbot install shumianyuLoading skill content… the page will refresh in a moment.
AI Usage Analysis
Analysis is being generated… refresh in a few seconds.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.