captionsExtract closed captions and subtitles from YouTube videos. Use when the user asks for captions, closed captions, CC, accessibility text, or wants to read what was said in a video. Supports timestamps and multiple languages. Great for deaf/HoH accessibility, content review, quoting, and translation.
Install via ClawdBot CLI:
clawdbot install therohitdas/captionsExtract closed captions from YouTube videos via TranscriptAPI.com.
If $TRANSCRIPT_API_KEY is not set, help the user create an account (100 free credits, no card):
Step 1 ā Register: Ask user for their email.
node ./scripts/tapi-auth.js register --email USER_EMAIL
ā OTP sent to email. Ask user: _"Check your email for a 6-digit verification code."_
Step 2 ā Verify: Once user provides the OTP:
node ./scripts/tapi-auth.js verify --token TOKEN_FROM_STEP_1 --otp CODE
API key saved to ~/.openclaw/openclaw.json. See File Writes below for details. Existing file is backed up before modification.
Manual option: transcriptapi.com/signup ā Dashboard ā API Keys.
The verify and save-key commands save the API key to ~/.openclaw/openclaw.json (sets skills.entries.transcriptapi.apiKey and enabled: true). Existing file is backed up to ~/.openclaw/openclaw.json.bak before modification.
To use the API key in terminal/CLI outside the agent, add to your shell profile manually:
export TRANSCRIPT_API_KEY=
curl -s "https://transcriptapi.com/api/v2/youtube/transcript\
?video_url=VIDEO_URL&format=json&include_timestamp=true&send_metadata=true" \
-H "Authorization: Bearer $TRANSCRIPT_API_KEY"
| Param | Required | Default | Values |
| ------------------- | -------- | ------- | ----------------------------------- |
| video_url | yes | ā | YouTube URL or video ID |
| format | no | json | json (structured), text (plain) |
| include_timestamp | no | true | true, false |
| send_metadata | no | false | true, false |
Response (format=json ā best for accessibility/timing):
{
"video_id": "dQw4w9WgXcQ",
"language": "en",
"transcript": [
{ "text": "We're no strangers to love", "start": 18.0, "duration": 3.5 },
{ "text": "You know the rules and so do I", "start": 21.5, "duration": 2.8 }
],
"metadata": { "title": "...", "author_name": "...", "thumbnail_url": "..." }
}
start: seconds from video startduration: how long caption is displayedResponse (format=text ā readable):
{
"video_id": "dQw4w9WgXcQ",
"language": "en",
"transcript": "[00:00:18] We're no strangers to love\n[00:00:21] You know the rules..."
}
format=json for sync'd captions (accessibility tools, timing analysis).format=text with include_timestamp=false for clean reading.| Code | Meaning | Action |
| ---- | ----------- | ----------------------------- |
| 402 | No credits | transcriptapi.com/billing |
| 404 | No captions | Video doesn't have CC enabled |
| 408 | Timeout | Retry once after 2s |
1 credit per request. Free tier: 100 credits, 300 req/min.
Generated Mar 1, 2026
This scenario involves providing closed captions for YouTube videos to ensure accessibility for deaf and hard of hearing individuals. It enables real-time reading of spoken content, enhancing inclusivity in educational and entertainment settings. The skill supports multiple languages and timestamps for precise synchronization.
Journalists can use this skill to extract accurate captions from video interviews or news clips for fact-checking and quoting purposes. It helps in verifying statements and creating written summaries without manual transcription, saving time and reducing errors. The JSON format with timestamps allows easy reference to specific moments.
Businesses can leverage this skill to extract captions in one language and translate them for global audiences, aiding in video localization. It supports multiple languages, making it useful for content creators and marketers targeting international markets. The structured JSON output facilitates integration with translation tools.
Researchers can analyze video content by extracting captions to study linguistic patterns, sentiment, or thematic elements in educational or documentary videos. The skill provides timestamps for temporal analysis and metadata for contextual understanding, enhancing qualitative research methods.
Video editors and producers can use this skill to generate captions for post-production, ensuring compliance with accessibility standards and improving viewer engagement. It allows for easy integration into editing software, with options for plain text or structured formats to suit different workflows.
Offer a free tier with 100 credits to attract users, then charge for additional credits or premium features like faster processing or higher accuracy. This model encourages adoption while generating revenue from heavy users such as businesses and content creators. It can scale based on usage volume and support tiers.
Partner with video platforms, educational institutions, or media companies to integrate the caption extraction service directly into their systems. This provides a steady revenue stream through licensing fees or revenue-sharing agreements. It targets industries with high demand for accessibility and content management.
Sell white-labeled versions of the skill to digital agencies or accessibility consultants, allowing them to offer caption services under their own brand. This model leverages existing client networks and provides customization options, generating income through setup fees and ongoing support contracts.
š¬ Integration Tip
Ensure the TRANSCRIPT_API_KEY is securely stored in the environment or configuration file, and test the API with sample videos to verify response formats before full deployment.
Translate text accurately ā preserve formatting, handle plurals, and adapt tone per locale.
Thinking partner that transforms ideas into platform-optimized content
AI-agent Skill for PPTX OOXML localization workflows. Use it to unpack PPTX, extract and apply text translations, normalize terminology, enforce language-specific fonts, validate XML integrity, and repack outputs with machine-readable JSON interfaces for automation.
Get subtitles from YouTube videos for translation, language learning, or reading along. Use when the user asks for subtitles, subs, foreign language text, or wants to read video content. Supports multiple languages and timestamped output for sync'd reading.
Translate PowerPoint files to any language while preserving layout. Uses a render-and-verify agent loop (LibreOffice + Vision) to guarantee no text overflow....
Apply metadata updates to existing Calibre books via calibredb over a Content server. Use for controlled metadata edits after target IDs are confirmed by a read-only lookup.