youtube-watcher: Read Any YouTube Video in Seconds
With 10,700+ downloads and 81 stars, youtube-watcher is one of the highest-rated utility skills on ClawHub. Built by @michael gathara, it uses yt-dlp to extract YouTube video transcripts — giving your OpenClaw agent the ability to read, summarize, and answer questions about any video that has subtitles, without a YouTube API key, without a browser, and without watching the video.
The Problem It Solves
Video content is locked inside audio and visual streams that AI agents can't directly process. A 45-minute conference talk, a product demo, a tutorial — agents have no way to access this information without transcripts. The YouTube Data API provides captions, but requires OAuth setup, has rate limits, and only works for videos where the uploader has granted access.
yt-dlp bypasses all of this. It extracts auto-generated and manual captions directly from YouTube's player, no API key needed. youtube-watcher packages this capability as a one-command OpenClaw skill.
Core Concept
The skill wraps a Python script that calls yt-dlp to download and output a video's transcript as plain text. Your agent reads the text and processes it like any other document.
YouTube URL → yt-dlp → Subtitle file → Plain text → Agent reads
No API quota. No rate limiting. No OAuth. If a video has captions (manual or auto-generated), the skill gets them.
Deep Dive
Basic Usage
python3 {baseDir}/scripts/get_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID"The {baseDir} is resolved automatically by OpenClaw to the skill's installation directory. Output is the full transcript as plain text, ready for the agent to process.
Summarizing a Video
Agent workflow:
- Fetch transcript with
get_transcript.py - Read the plain text output
- Produce a structured summary
Works equally well for:
- Conference talks (structure: speaker, key points, Q&A)
- Tutorial videos (structure: steps, commands, prerequisites)
- Product demos (structure: features shown, use cases demonstrated)
- Podcasts uploaded to YouTube (structure: topics discussed, guests, timestamps)
Finding Specific Information
# Fetch the transcript
python3 {baseDir}/scripts/get_transcript.py "https://youtu.be/abc123"
# Agent searches the output for specific terms
# "What did they say about rate limits?"
# "Find all mentions of the API key setup"
# "What are the exact commands shown in the tutorial?"Because the transcript is plain text, any string search or semantic analysis the agent can do on documents applies directly to video content.
Multi-Video Research
For a research task spanning multiple videos:
# Process multiple videos sequentially
python3 {baseDir}/scripts/get_transcript.py "https://youtu.be/video1"
python3 {baseDir}/scripts/get_transcript.py "https://youtu.be/video2"
python3 {baseDir}/scripts/get_transcript.py "https://youtu.be/video3"The agent can compare across transcripts, synthesize recurring themes, or identify contradictions between sources — research tasks that would take hours of video watching done in seconds.
Setup
clawhub install youtube-watcherInstall yt-dlp (required):
# macOS
brew install yt-dlp
# Python (any OS)
pip install yt-dlpVerify installation:
yt-dlp --versionNo API keys. No accounts. No configuration beyond installing yt-dlp.
Comparison: YouTube Transcript Tools
| Tool | API Key Required | Rate Limits | Auto-captions | Cost |
|---|---|---|---|---|
| youtube-watcher (yt-dlp) | ❌ | ❌ | ✅ | Free |
| YouTube Data API v3 | ✅ | ✅ quota | ✅ | Free tier |
| youtube-transcript-api (Python) | ❌ | ⚠️ soft limits | ✅ | Free |
| Whisper (audio transcription) | ❌ | ❌ | ✅ any video | Free (local compute) |
youtube-watcher's advantage is simplicity: one dependency (yt-dlp), one script, works on any video with existing captions. For videos without captions, the OpenClaw openai-whisper skill can transcribe from audio — a useful fallback for youtube-watcher's main limitation.
Practical Tips
- Check caption availability first — videos without manual or auto-generated subtitles will fail with an error; check the YouTube player for the CC indicator
- Prefer manual captions over auto-generated — auto-generated captions (especially for technical content) may have errors in terminology; the skill picks up whatever is available
- Combine with timestamp references — some transcripts include timestamps; ask the agent to extract them for a timestamped outline
- Batch research workflows — for conference playlists or tutorial series, process all videos then ask questions across the entire corpus
- Language selection —
yt-dlpcan fetch captions in specific languages; modify the script to prefer a language if the video has multiple subtitle tracks
Considerations
- Captions required: Videos without subtitles (manual or auto-generated) cannot be transcribed by this skill. Audio-only transcription requires a separate tool like
openai-whisper. - Auto-caption quality varies: Technical jargon, proper nouns, and non-English accents frequently produce errors in auto-generated captions. Treat auto-caption transcripts as approximate.
- YouTube ToS:
yt-dlpdownloads publicly available subtitle data. For personal/research use this is straightforward; automated large-scale scraping may conflict with YouTube's Terms of Service. - Video length: Very long videos (3+ hours) produce very large transcripts. Consider chunking or asking for summary first before full extraction.
The Bigger Picture
youtube-watcher democratizes access to video knowledge for AI agents. The amount of technical content, tutorials, interviews, and conference talks published to YouTube every day vastly exceeds what can be consumed by watching. For agents tasked with staying current in a field, answering questions from video sources, or synthesizing across a content creator's catalog, youtube-watcher turns YouTube from a streaming platform into a searchable knowledge base.
View the skill on ClawHub: youtube-watcher