video-understandingAnalyze and summarize videos from 1000+ sites using Google Gemini AI, providing transcripts, descriptions, summaries, and answers to questions.
Install via ClawdBot CLI:
clawdbot install bill492/video-understandingAnalyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.
yt-dlp — brew install yt-dlp / pip install yt-dlpffmpeg — brew install ffmpeg (for merging video+audio streams)GEMINI_API_KEY environment variableReturns structured JSON:
[MM:SS] timestampsuv run {baseDir}/scripts/analyze_video.py "<video-url>"
uv run {baseDir}/scripts/analyze_video.py "<video-url>" -q "What product is shown?"
uv run {baseDir}/scripts/analyze_video.py "<video-url>" -p "Custom prompt" --raw
uv run {baseDir}/scripts/analyze_video.py "<video-url>" --download-only -o video.mp4
| Flag | Description | Default |
|------|-------------|---------|
| -q / --question | Question to answer (added to default fields) | none |
| -p / --prompt | Override entire prompt (ignores -q) | structured JSON |
| -m / --model | Gemini model | gemini-2.5-flash |
| -o / --output | Save output to file | stdout |
| --keep | Keep downloaded video file | false |
| --download-only | Download only, skip analysis | false |
| --max-size | Max file size in MB | 500 |
| --raw | Raw text output instead of JSON | false |
Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.
-q for targeted questions on top of the full analysisuvGenerated Mar 1, 2026
Educators and e-learning platforms can use this skill to automatically transcribe and summarize instructional videos, generating structured notes and answering student questions about video content. It supports platforms like YouTube and Loom, making it ideal for online courses and training materials.
Marketing teams can analyze videos from TikTok, Instagram, and Twitter/X to track brand mentions, understand visual trends, and generate summaries for reporting. The skill extracts transcripts and descriptions, aiding in content strategy and competitive analysis.
Support teams can analyze customer-submitted videos from platforms like Loom to quickly transcribe issues, describe visual elements (e.g., UI errors), and answer specific questions, improving response times and accuracy in troubleshooting.
Production studios can use this skill to generate transcripts and visual descriptions for raw footage from various sources, aiding in editing, subtitling, and content indexing. It handles large videos via Gemini's File API, streamlining post-production processes.
Legal firms can analyze video evidence from depositions or surveillance, extracting verbatim transcripts with timestamps and summarizing key events. This assists in case preparation and compliance reporting by automating video content review.
Offer a cloud-based service where users upload video URLs to receive automated analysis reports via API. Charge based on usage tiers (e.g., number of videos processed per month) and include premium features like custom prompts or higher file size limits.
License the skill as part of a larger enterprise software suite for industries like education or media, providing tailored solutions with dedicated support. Revenue comes from one-time licensing fees or annual contracts with customization options.
Provide a free basic version for individual users with limited features (e.g., max video size or analysis frequency) and upsell to a paid plan for advanced capabilities like batch processing, API access, or priority support. Monetize through premium upgrades.
💬 Integration Tip
Ensure yt-dlp and ffmpeg are installed via brew or pip, and set the GEMINI_API_KEY environment variable before use; for YouTube URLs, leverage direct Gemini processing to avoid download delays.
Extract frames or short clips from videos using ffmpeg.
Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”, “get subtitles”, “get transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.
Generate SRT subtitles from video/audio with translation support. Transcribes Hebrew (ivrit.ai) and English (whisper), translates between languages, burns subtitles into video. Use for creating captions, transcripts, or hardcoded subtitles for WhatsApp/social media.
Create AI videos with optimized prompts, motion control, and platform-ready output.
自动登录抖音账号,上传并发布视频到抖音创作者平台,支持视频标签管理和登录状态检查。
Create product demo videos by automating browser interactions and capturing frames. Use when the user wants to record a demo, walkthrough, product showcase, or interactive video of a web application. Supports Playwright CDP screencast for high-quality capture and FFmpeg for video encoding.