skill-spotlightproductivityyoutube-watcherclawhubopenclawyoutubetranscriptyt-dlp

youtube-watcher: Read Any YouTube Video in Seconds

March 17, 2026·5 min read

With 10,700+ downloads and 81 stars, youtube-watcher is one of the highest-rated utility skills on ClawHub. Built by @michael gathara, it uses yt-dlp to extract YouTube video transcripts — giving your OpenClaw agent the ability to read, summarize, and answer questions about any video that has subtitles, without a YouTube API key, without a browser, and without watching the video.

The Problem It Solves

Video content is locked inside audio and visual streams that AI agents can't directly process. A 45-minute conference talk, a product demo, a tutorial — agents have no way to access this information without transcripts. The YouTube Data API provides captions, but requires OAuth setup, has rate limits, and only works for videos where the uploader has granted access.

yt-dlp bypasses all of this. It extracts auto-generated and manual captions directly from YouTube's player, no API key needed. youtube-watcher packages this capability as a one-command OpenClaw skill.

Core Concept

The skill wraps a Python script that calls yt-dlp to download and output a video's transcript as plain text. Your agent reads the text and processes it like any other document.

YouTube URL → yt-dlp → Subtitle file → Plain text → Agent reads

No API quota. No rate limiting. No OAuth. If a video has captions (manual or auto-generated), the skill gets them.

Deep Dive

Basic Usage

python3 {baseDir}/scripts/get_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID"

The {baseDir} is resolved automatically by OpenClaw to the skill's installation directory. Output is the full transcript as plain text, ready for the agent to process.

Summarizing a Video

Agent workflow:

Fetch transcript with get_transcript.py
Read the plain text output
Produce a structured summary

Works equally well for:

Conference talks (structure: speaker, key points, Q&A)
Tutorial videos (structure: steps, commands, prerequisites)
Product demos (structure: features shown, use cases demonstrated)
Podcasts uploaded to YouTube (structure: topics discussed, guests, timestamps)

Finding Specific Information

# Fetch the transcript
python3 {baseDir}/scripts/get_transcript.py "https://youtu.be/abc123"
 
# Agent searches the output for specific terms
# "What did they say about rate limits?"
# "Find all mentions of the API key setup"
# "What are the exact commands shown in the tutorial?"

Because the transcript is plain text, any string search or semantic analysis the agent can do on documents applies directly to video content.

Multi-Video Research

For a research task spanning multiple videos:

# Process multiple videos sequentially
python3 {baseDir}/scripts/get_transcript.py "https://youtu.be/video1"
python3 {baseDir}/scripts/get_transcript.py "https://youtu.be/video2"
python3 {baseDir}/scripts/get_transcript.py "https://youtu.be/video3"

The agent can compare across transcripts, synthesize recurring themes, or identify contradictions between sources — research tasks that would take hours of video watching done in seconds.

Setup

clawhub install youtube-watcher

Install yt-dlp (required):

# macOS
brew install yt-dlp
 
# Python (any OS)
pip install yt-dlp

Verify installation:

yt-dlp --version

No API keys. No accounts. No configuration beyond installing yt-dlp.

Comparison: YouTube Transcript Tools

Tool	API Key Required	Rate Limits	Auto-captions	Cost
youtube-watcher (yt-dlp)	❌	❌	✅	Free
YouTube Data API v3	✅	✅ quota	✅	Free tier
youtube-transcript-api (Python)	❌	⚠️ soft limits	✅	Free
Whisper (audio transcription)	❌	❌	✅ any video	Free (local compute)

youtube-watcher's advantage is simplicity: one dependency (yt-dlp), one script, works on any video with existing captions. For videos without captions, the OpenClaw openai-whisper skill can transcribe from audio — a useful fallback for youtube-watcher's main limitation.

Practical Tips

Check caption availability first — videos without manual or auto-generated subtitles will fail with an error; check the YouTube player for the CC indicator
Prefer manual captions over auto-generated — auto-generated captions (especially for technical content) may have errors in terminology; the skill picks up whatever is available
Combine with timestamp references — some transcripts include timestamps; ask the agent to extract them for a timestamped outline
Batch research workflows — for conference playlists or tutorial series, process all videos then ask questions across the entire corpus
Language selection — yt-dlp can fetch captions in specific languages; modify the script to prefer a language if the video has multiple subtitle tracks

Considerations

Captions required: Videos without subtitles (manual or auto-generated) cannot be transcribed by this skill. Audio-only transcription requires a separate tool like openai-whisper.
Auto-caption quality varies: Technical jargon, proper nouns, and non-English accents frequently produce errors in auto-generated captions. Treat auto-caption transcripts as approximate.
YouTube ToS: yt-dlp downloads publicly available subtitle data. For personal/research use this is straightforward; automated large-scale scraping may conflict with YouTube's Terms of Service.
Video length: Very long videos (3+ hours) produce very large transcripts. Consider chunking or asking for summary first before full extraction.

The Bigger Picture

youtube-watcher democratizes access to video knowledge for AI agents. The amount of technical content, tutorials, interviews, and conference talks published to YouTube every day vastly exceeds what can be consumed by watching. For agents tasked with staying current in a field, answering questions from video sources, or synthesizing across a content creator's catalog, youtube-watcher turns YouTube from a streaming platform into a searchable knowledge base.

View the skill on ClawHub: youtube-watcher

← Back to Blog