byted-mediakit-voiceover-editingVolcano Engine AI MediaKit talking-head video editing Skill: a one-stop workflow from environment setup through media management, audio processing, talking-h...
Install via ClawdBot CLI:
clawdbot install volc-ai-mediakit/byted-mediakit-voiceover-editingGrade Good — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Sends data to undocumented external endpoint (potential exfiltration)
Post → https://example.com/mock-poster.jpgCalls external URL not in known-safe list
https://openspeech.bytedance.com/api/v3/auc/bigmodelAI Analysis
The skill's external API calls (Volcano Engine VOD, ByteDance ASR) are consistent with its stated purpose of video/audio editing and processing. While the 'UNKNOWN_DATA_SINK' signal points to a potential risk, the provided evidence (Post → https://example.com/mock-poster.jpg) is a placeholder/example URL, not proof of actual malicious exfiltration. The skill definition emphasizes secure credential handling and minimal permissions.
Audited Apr 16, 2026 · audit v1.0
Generated Apr 6, 2026
教育机构或教师录制教学视频后,使用该技能自动去除讲课中的停顿、口误和重复内容,生成精炼流畅的教学视频,提升学习体验和课程质量。
企业培训部门制作内部培训视频时,通过智能剪辑去除演讲者的犹豫、口头禅和空白停顿,使培训内容更专业紧凑,节省员工观看时间。
短视频创作者和自媒体运营者使用该工具快速处理口播视频,自动剪辑去除冗余部分,提高内容产出效率,保持观众注意力。
市场营销团队制作产品演示视频时,利用语音识别和智能剪辑功能,确保演示内容流畅专业,突出产品卖点,提升转化效果。
播客制作人处理访谈或单人讲述内容时,自动识别并剪辑掉长时间的沉默、口误和无关内容,减少后期编辑工作量。
向企业客户提供按月或按年订阅的云服务,根据视频处理时长或数量分级定价,提供稳定的经常性收入。
根据处理的视频时长或文件大小收费,适合偶尔使用的个人创作者或中小型企业,降低使用门槛。
为大型企业或教育机构提供定制化集成服务,包括API接入、私有化部署和专属功能开发,收取高额项目费用。
💬 Integration Tip
需要配置火山引擎VOD服务和语音识别API密钥,建议在测试环境验证后再部署到生产环境。
Scored Jun 20, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.