moss-transcribe-diarizeMOSS 多说话人转写技能。支持 URL / 本地文件 / Base64 音频输入,输出带时间戳与 speaker 的结构化转写结果(JSON、逐段文本、按说话人汇总)。用于会议纪要、访谈录音、多人对话整理。需要 API 凭证(环境变量:MOSS_API_KEY,兼容 MOSI_TTS_API_KEY / MOS...
Install via ClawdBot CLI:
clawdbot install helloeveryworlds/moss-transcribe-diarizeGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://example.com/audio.mp3Audited Apr 16, 2026 · audit v1.0
Generated Mar 20, 2026
Automatically transcribes and diarizes meeting recordings from platforms like Zoom or Teams, identifying speakers and timestamps. Outputs structured JSON for easy integration into project management tools, enabling efficient minute-taking and action item tracking.
Processes audio from interviews or focus groups, separating multiple speakers to analyze responses by participant. Generates summarized text per speaker for qualitative research, reducing manual transcription time and improving data accuracy.
Transcribes legal proceedings, depositions, or client consultations with speaker diarization to attribute statements accurately. Provides timestamped JSON outputs for evidence logging and compliance, streamlining legal documentation workflows.
Analyzes customer service calls by transcribing and identifying agents versus customers, enabling sentiment analysis and training insights. Outputs segmented text for review, helping improve service quality and compliance monitoring.
Transcribes lectures or podcasts with multiple hosts or guests, creating structured transcripts for accessibility or content repurposing. Outputs speaker-specific summaries for editing and publishing, enhancing content reach and engagement.
Offers a cloud-based API service with tiered pricing based on audio duration or usage volume. Targets businesses needing scalable transcription, with premium features like advanced analytics and integrations, generating recurring revenue.
Provides on-premise or custom deployments for large organizations with high security and compliance needs. Includes dedicated support and customization, sold as annual licenses or project-based contracts for steady income.
Offers a free tier with limited features to attract individual users or small teams, then charges per transcription minute or API call for advanced usage. Encourages adoption and scales with user growth for flexible revenue streams.
💬 Integration Tip
Ensure API keys are set in environment variables and use Python scripts for seamless automation with existing audio processing pipelines.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.