YouTube Transcript: Fetch Any Video's Full Text via Residential IP Proxy
14,337 downloads and 21 stars. The YouTube Transcript skill — by xthezealot — solves a specific problem that every developer running agents on cloud VPS infrastructure eventually hits: YouTube blocks transcript requests from data center IP ranges.
This skill routes around that block using a WireGuard VPN to a residential IP, making transcript fetching reliable from the same cloud environments where youtube_transcript_api silently fails.
The Problem It Solves
YouTube's transcript API works fine from home internet. It breaks the moment you run it from an AWS EC2, Google Cloud, DigitalOcean, or any other data center IP range. YouTube detects the IP range and returns errors or empty results — not because the video lacks a transcript, but because the origin looks like a scraper farm.
For AI agents running on VPS infrastructure, this means transcript-fetching either requires complex workarounds or silently fails mid-pipeline. youtube-transcript handles this transparently: it checks whether a VPN is needed, brings it up if necessary, and fetches the transcript through a residential IP.
Core Concept: Residential Proxy via WireGuard
The skill's key mechanism is a WireGuard tunnel configured to exit through a residential IP address. When the agent calls the script:
- It checks the current network path and whether YouTube is accessible from the current IP
- If the connection would be blocked (cloud VPS detected), it brings up the WireGuard tunnel automatically
- The transcript request goes out through the residential exit IP
- Results are returned as structured JSON and the tunnel state is maintained for subsequent calls
The residential proxy configuration lives in references/SETUP.md, which walks through WireGuard installation and configuration for common VPS providers.
Deep Dive
Basic Usage
python3 scripts/fetch_transcript.py <video_id_or_url>Both video ID and full URL are accepted:
python3 scripts/fetch_transcript.py dQw4w9WgXcQ
python3 scripts/fetch_transcript.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"Output Format
The script returns structured JSON:
{
"video_id": "dQw4w9WgXcQ",
"title": "Video title here",
"author": "Channel name",
"full_text": "Complete transcript as a single string...",
"transcript": [
{ "start": 0.0, "duration": 4.2, "text": "First line of transcript" },
{ "start": 4.2, "duration": 3.8, "text": "Second line..." }
]
}The full_text field is the most useful for summarization tasks — pass it directly to Claude. The transcript array with timestamps is useful when you need to reference specific moments ("at 2:34, the speaker mentions...").
Language Selection
Default fetch priority: en, fr, de, es, it, pt, nl
Override with a custom priority list:
python3 scripts/fetch_transcript.py VIDEO_ID "ja,ko,zh"The skill picks the first available language from your priority list. If the video only has auto-generated captions, it falls back to those.
Typical Agent Workflow
1. Run fetch_transcript.py with video ID or URL
2. Script checks VPN, brings it up if needed
3. Returns JSON with full transcript text
4. Summarize the full_text field as needed
Comparison: YouTube Transcript Solutions
| Solution | Works on Cloud VPS | No API Key | Timestamps | JSON Output |
|---|---|---|---|---|
youtube_transcript_api (Python) | ❌ Often blocked | ✅ | ✅ | ✅ |
| YouTube Data API v3 | ✅ | ❌ Requires key | ✅ | ✅ |
| youtube-transcript skill | ✅ Via WireGuard | ✅ | ✅ | ✅ |
| Manual copy-paste | ✅ | ✅ | ❌ | ❌ |
How to Install
clawhub install youtube-transcriptFollow the WireGuard setup in references/SETUP.md before first use. The setup covers:
- Python dependency installation
- WireGuard VPN configuration for cloud VPS environments
- Troubleshooting common errors
- Alternative proxy options if WireGuard isn't available
If you're running locally (home or office network), you may not need the VPN configuration at all — the script checks the network situation before routing.
Practical Tips
- Set up WireGuard first. The skill's value proposition depends on the proxy working. Follow the SETUP.md before running anything. Skipping this on a cloud VPS means silent failures.
- Use
full_textfor summarization. Don't try to parse the timestamped transcript for summary tasks —full_textis already concatenated and ready to pass to Claude. - Cache transcripts locally. Transcript content doesn't change for published videos. If you're processing the same video repeatedly, save the JSON output and skip the fetch.
- Language priority matters. For international content, always specify your preferred language list. Auto-generated captions quality varies significantly by language.
- Pair with
web-searchfor context. Fetch the transcript, then search for related articles or discussions to cross-reference claims in the video. - The skill handles URL normalization. Shorts URLs, watch URLs, and video IDs all work — no need to strip parameters manually.
Considerations
- WireGuard setup is required for cloud environments. This is not a zero-configuration skill for VPS users. Setting up the residential proxy requires following the setup guide — plan for 15–30 minutes on first install.
- Residential proxy dependency. The proxy's reliability depends on your VPN provider. If the exit IP gets blocked by YouTube, you'll need to rotate to a new residential IP.
- Videos without transcripts return errors. Live streams, some music videos, and videos with disabled captions won't have fetachable transcripts. Handle this in your agent workflow.
- Auto-generated captions can be noisy. For accuracy-sensitive tasks (legal content, medical information), manually verify that the transcript is human-generated and not auto-captioned.
- Security review status. Third-party skill auditors (Playbooks.com, openclaw.army) have flagged
youtube-transcriptas "Needs review" with a C70/100 health score, categorizing it as "dual-use" due to its platform-bypass mechanism. The skill itself does not exfiltrate data or execute malicious commands — the flag reflects the residential proxy pattern rather than confirmed malware. That said, review the SKILL.md andreferences/SETUP.mdbefore installing in a sensitive environment. - Terms of service. Fetching transcripts for personal/agent use is generally accepted, but high-volume automated fetching of copyrighted content may conflict with YouTube's ToS. Review the terms for your specific use case.
The Bigger Picture
The youtube-transcript skill represents a growing pattern in the OpenClaw ecosystem: skills that exist specifically to solve infrastructure-layer problems that block otherwise-simple agent tasks.
YouTube has become the world's largest knowledge base — tutorials, lectures, conference talks, interviews, and primary source material that doesn't exist as text anywhere else. The ability to reliably fetch transcripts at scale, even from cloud environments, unlocks this knowledge for AI agent pipelines.
With 14,000+ downloads, the demand is clearly real. Whether it's a research agent pulling conference talks, a content summarizer processing educational channels, or a competitive intelligence agent monitoring industry news — youtube-transcript makes YouTube first-class text for your agent.
View the skill on ClawHub: youtube-transcript