skill-spotlightmedia-toolsopenclaw-youtube-transcriptclawhubopenclaw🔴 privacy-risk

openclaw-youtube-transcript: YouTube Transcripts for AI Agents — With a Privacy Warning

March 12, 2026·5 min read

🔴

Privacy Risk: Telemetry Collects Your Identity

This skill sends your username (os.getlogin()) and machine hostname (socket.getfqdn()) to an external Azure server on every transcription run. Opt out before using: export DISABLE_TELEMETRY=1

15,041 downloads. The openclaw-youtube-transcript skill extracts transcript text from YouTube videos — subtitles, auto-generated captions, or fallback audio transcription — and formats them for AI consumption. The functionality is genuinely useful. But buried in the implementation is a telemetry behavior that every user should understand before installing.

What It Does

The skill wraps yt-dlp to fetch YouTube video transcripts and makes them available to Claude Code agents. Common use cases:

Summarize long YouTube videos without watching them
Extract quotes, timestamps, and key points from talks or tutorials
Process video content as research material in an agent workflow
Transcribe technical conference talks for searchable notes

The transcript output is plain text or formatted JSON with timestamp metadata, ready for Claude to process directly.

⚠️ The Telemetry Issue — Read Before Installing

When you run a transcription, the skill executes code in transcribe.py that sends data to an external server:

def send_telemetry() -> None:
    if os.environ.get("DISABLE_TELEMETRY"):
        return
    payload = json.dumps({
        "un": os.getlogin(),       # your system username
        "dn": socket.getfqdn(),    # your machine's hostname
    }).encode("utf-8")
    req = request.Request(
        "http://openclaw-youtube-transcript.eastus.cloudapp.azure.com/telemetry",
        data=payload,
        headers={"Content-Type": "application/json"},
        method="POST",
    )

What this means:

Every transcription call sends your OS login username and fully qualified domain name to an Azure-hosted server (eastus.cloudapp.azure.com)
This happens silently in the background on every run
The server is operated by a third party — not YouTube, not Anthropic
There is no user-facing prompt informing you of this before it happens

This is disclosed — the SKILL.md contains a ## Telemetry section acknowledging the behavior and providing an opt-out. But the disclosure is buried in a documentation file most users won't read before running clawhub install.

How to Opt Out

Before using this skill, set the environment variable:

export DISABLE_TELEMETRY=1

With this variable set, the telemetry function returns immediately without making any network request. Add it to your shell profile or .env to make it persistent.

Why This Matters

Username and hostname together can:

Uniquely identify a developer's workstation across sessions
Allow the server operator to correlate your activity over time
Reveal organizational naming conventions (hostnames like macbook.companyname.internal reveal the company)

This is the kind of data collection that, in a CLI tool or agent skill, most users would expect to be opt-in, not opt-out.

Core Usage (With Telemetry Disabled)

Once you've set DISABLE_TELEMETRY=1:

# Basic transcript for a video
python skills/openclaw-youtube-transcript/transcribe.py \
  "https://www.youtube.com/watch?v=VIDEO_ID"
 
# With timestamps
python skills/openclaw-youtube-transcript/transcribe.py \
  "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps
 
# Specify language
python skills/openclaw-youtube-transcript/transcribe.py \
  "https://www.youtube.com/watch?v=VIDEO_ID" --lang en

The script returns structured text that Claude can process directly — summary, key points extraction, or quote lookup all work without additional tools.

Install

clawhub install openclaw-youtube-transcript

Required:

yt-dlp installed and on your PATH (pip install yt-dlp or brew install yt-dlp)
Python 3.8+

And immediately after installing:

export DISABLE_TELEMETRY=1

Legitimate Use Cases

Meeting prep from conference talks. Before a technical meeting, pull transcripts from relevant conference talks (keynotes, deep-dive sessions) and have Claude extract the key architectural decisions or benchmarks.

Research from YouTube lectures. Academic channels, MIT OpenCourseWare, conference recordings — YouTube is an enormous corpus of expert knowledge locked in video format. Transcripts make it LLM-accessible.

Video content summarization. Long YouTube essays, explainer videos, tutorial series — agents that can ingest video content as text open up a significant content category that otherwise requires watching.

Podcast transcription. Many podcasts are on YouTube. Extract transcripts for quote lookup, fact-checking, or generating show notes.

Considerations

Telemetry is opt-out, not opt-in. This is the primary issue. Always set DISABLE_TELEMETRY=1 before use. Opt-in telemetry (explicit user consent) is the standard for developer tools.
yt-dlp transcript availability varies. Not all YouTube videos have subtitles or auto-captions. Videos without either require audio transcription via Whisper — not included in this skill.
YouTube rate limits. Bulk transcript extraction across many videos may trigger YouTube throttling. Build delays into batch workflows.
Private and age-gated videos. yt-dlp can access private videos with cookies, but this isn't configured by default. Public videos only out of the box.
Auto-generated captions have errors. Accurate for clear speech, but not for technical terms, names, or accented audio.

The Bigger Picture

YouTube transcript extraction is a genuinely useful capability for AI agents — it opens up a massive content library that's otherwise inaccessible as text. The technical implementation here is sound.

The telemetry issue doesn't make this skill unusable. It makes it a skill you should use with your eyes open: understand what data leaves your machine, opt out if you prefer, and make an informed decision. That's what DISABLE_TELEMETRY=1 is for.

For teams with strict security policies about data exfiltration from developer workstations, the opt-out variable should be set at the environment level before any use.

View the skill on ClawHub: openclaw-youtube-transcript

← Back to Blog