youtube-whisperYouTube影片一鍵轉文字!自動下載影片並用AI轉成中文/英文字幕,沒有字幕的影片也能用。
Install via ClawdBot CLI:
clawdbot install dolphins1123/youtube-whisperGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://www.youtube.com/watch?v=VIDEO_IDAudited Apr 17, 2026 · audit v1.0
Generated Mar 21, 2026
Researchers can use this skill to transcribe YouTube lectures, interviews, or documentaries that lack subtitles for qualitative analysis, enabling offline text extraction without relying on platform captions. It supports multiple languages and models for accurate transcription of specialized terminology.
Content creators and localization teams can transcribe YouTube videos to generate subtitles or translate content for audiences in different regions, especially when automatic captions are unavailable or inaccurate. The local processing ensures data privacy and high-quality output.
Organizations can provide accessible content by transcribing YouTube videos for deaf or hard-of-hearing users, creating text-based alternatives for videos that lack built-in subtitles. The skill allows for offline processing to handle sensitive or proprietary video materials securely.
Legal professionals can transcribe YouTube videos as evidence or for compliance reviews, such as recording public statements or training videos, ensuring accurate text records when captions are missing. The local Whisper model maintains confidentiality without cloud dependencies.
Educators and language learners can transcribe YouTube videos in various languages to create study materials, practice listening comprehension, or develop interactive lessons, leveraging the skill's support for multiple models to balance speed and accuracy.
Offer a web-based platform where users can upload YouTube URLs for transcription, with free basic usage and paid tiers for higher accuracy models, faster processing, or bulk operations. Revenue comes from subscription plans and API access for developers.
License the skill as part of a larger software suite for businesses in media, education, or legal sectors, providing custom integrations, support, and enhanced features like batch processing. Revenue is generated through one-time licenses or annual maintenance contracts.
Partner with video platforms, content management systems, or accessibility tool providers to integrate the transcription capability, earning revenue through referral fees, revenue sharing, or white-label solutions. This model leverages existing user bases for scalable adoption.
💬 Integration Tip
Integrate with existing video platforms via APIs to automate transcription workflows, and ensure dependencies like yt-dlp and Whisper are pre-installed in deployment environments for seamless operation.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Start voice calls via the OpenClaw voice-call plugin.
Local text-to-speech via sherpa-onnx (offline, no cloud)