gemini-yt-video-transcriptCreate a verbatim transcript for a YouTube URL using Google Gemini (speaker labels, paragraph breaks; no time codes). Use when the user asks to transcribe a YouTube video or wants a clean transcript (no timestamps).
Install via ClawdBot CLI:
clawdbot install odrobnik/gemini-yt-video-transcriptCreate a verbatim transcript for a YouTube URL using Google Gemini.
Output format
Speaker: text
Requirements
python3 {baseDir}/scripts/youtube_transcript.py "https://www.youtube.com/watch?v=..."
Options:
--out Write transcript to a specific file (default: auto-named in the workspace out/ folder).When chatting: send the resulting transcript as a document/attachment.
Generated Feb 24, 2026
Researchers and students can use this skill to transcribe educational YouTube videos, such as lectures or conference talks, for detailed analysis, note-taking, or citation purposes. It supports speaker labeling, making it ideal for multi-speaker content like panel discussions.
Content creators, editors, and marketers can generate clean transcripts for YouTube videos to create subtitles, blog posts, or social media snippets. The verbatim output without timestamps streamlines repurposing video content into written formats.
Legal professionals can transcribe YouTube videos containing testimonies, public statements, or training sessions for evidence gathering or compliance records. Speaker labels help identify individuals in multi-party recordings.
Organizations focused on accessibility can use this skill to provide transcripts for deaf or hard-of-hearing users, enhancing video accessibility on platforms like YouTube. It supports creating readable text versions without technical clutter.
Companies can transcribe internal training videos or external webinars hosted on YouTube for employee reference, archiving, or translation into other languages. The clean format facilitates integration into learning management systems.
Offer a free tier for basic transcript generation with limited videos per month, and premium plans for higher volume, faster processing, or API access. Revenue can come from subscriptions and enterprise licenses.
Provide the transcription functionality as an API that developers can integrate into their applications, such as content management systems or e-learning platforms. Charge based on usage tiers or per-transaction fees.
License the skill to marketing agencies, legal firms, or educational institutions as a white-label tool they can rebrand and offer to their clients. Revenue is generated through one-time licensing fees or ongoing support contracts.
💬 Integration Tip
Ensure the GEMINI_API_KEY is securely stored as an environment variable and that the workspace has Python3 installed for script execution.
Extract frames or short clips from videos using ffmpeg.
Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”, “get subtitles”, “get transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.
Generate SRT subtitles from video/audio with translation support. Transcribes Hebrew (ivrit.ai) and English (whisper), translates between languages, burns subtitles into video. Use for creating captions, transcripts, or hardcoded subtitles for WhatsApp/social media.
Create AI videos with optimized prompts, motion control, and platform-ready output.
自动登录抖音账号,上传并发布视频到抖音创作者平台,支持视频标签管理和登录状态检查。
AI video generation workflow on Volcengine. Use when users need text-to-video, image-to-video, generation parameter tuning, or async task troubleshooting for video jobs.