tubescribeYouTube video summarizer with speaker detection, formatted documents, and audio output. Works out of the box with macOS built-in TTS. Optional recommended tools (pandoc, ffmpeg, mlx-audio) enhance quality. Requires internet for YouTube access. No paid APIs or subscriptions. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video.
Install via ClawdBot CLI:
clawdbot install matusvojtek/tubescribeTurn any YouTube video into a polished document + audio summary.
Drop a YouTube link ā get a beautiful transcript with speaker labels, key quotes, timestamps that link back to the video, and an audio summary you can listen to on the go.
When user sends a YouTube URL:
DO NOT BLOCK ā spawn and move on instantly.
Run setup to check dependencies and configure defaults:
python skills/tubescribe/scripts/setup.py
This checks: summarize CLI, pandoc, ffmpeg, Kokoro TTS
Spawn ONE sub-agent that does the entire pipeline:
sessions_spawn(
task=f"""
## TubeScribe: Process {youtube_url}
ā ļø CRITICAL: Do NOT install any software.
No pip, brew, curl, venv, or binary downloads.
If a tool is missing, STOP and report what's needed.
Run the COMPLETE pipeline ā do not stop until all steps are done.
### Step 1: Extractbash
python3 skills/tubescribe/scripts/tubescribe.py "{youtube_url}"
Note the **Source** and **Output** paths printed by the script. Use those exact paths in subsequent steps.
### Step 2: Read source JSON
Read the Source path from Step 1 output and note:
- metadata.title (for filename)
- metadata.video_id
- metadata.channel, upload_date, duration_string
### Step 3: Create formatted markdown
Write to the Output path from Step 1:
1. `# **<title>**`
---
2. Video info block ā Channel, Date, Duration, URL (clickable). Empty line between each field.
---
3. `## **Participants**` ā table with bold headers:
| Name | Role | Description |
|----------|----------|-----------------|
---
4. `## **Summary**` ā 3-5 paragraphs of prose
---
5. `## **Key Quotes**` ā 5 best with clickable YouTube timestamps. Format each as:
"Quote text here." - 12:34
"Another quote." - 25:10
Use regular dash `-`, NOT em dash `ā`. Do NOT use blockquotes `>`. Plain paragraphs only.
---
6. `## **Viewer Sentiment**` (if comments exist)
---
7. `## **Best Comments**` (if comments exist) ā Top 5, NO lines between them:
Comment text here.
- ā² 123 @AuthorName
Next comment text here.
- ā² 45 @AnotherAuthor
Attribution line: dash + italic. Just blank line between comments, NO `---` separators.
---
8. `## **Full Transcript**` ā merge segments, speaker labels, clickable timestamps
### Step 4: Create DOCX
Clean the title for filename (remove special chars), then:bash
pandoc
### Step 5: Generate audio
Write the summary text to a temp file, then use TubeScribe's built-in audio generation:bash
python3 -c "
text = '''YOUR SUMMARY TEXT HERE'''
with open('
f.write(text)
"
python3 skills/tubescribe/scripts/tubescribe.py \
--generate-audio
--audio-output ~/Documents/TubeScribe/
This reads `~/.tubescribe/config.json` and uses the configured TTS engine (mlx/kokoro/builtin), voice blend, and speed automatically. Output format (mp3/wav) comes from config.
### Step 6: Cleanupbash
python3 skills/tubescribe/scripts/tubescribe.py --cleanup
### Step 7: Open folderbash
open ~/Documents/TubeScribe/
### Report
Tell what was created: DOCX name, MP3 name + duration, video stats.
""",
label="tubescribe",
runTimeoutSeconds=900,
cleanup="delete"
)
After spawning, reply immediately:
š¬ TubeScribe is processing - I'll let you know when it's ready!
Then continue the conversation. The sub-agent notification announces completion.
Config file: ~/.tubescribe/config.json
{
"output": {
"folder": "~/Documents/TubeScribe",
"open_folder_after": true,
"open_document_after": false,
"open_audio_after": false
},
"document": {
"format": "docx",
"engine": "pandoc"
},
"audio": {
"enabled": true,
"format": "mp3",
"tts_engine": "mlx"
},
"mlx_audio": {
"path": "~/.openclaw/tools/mlx-audio",
"model": "mlx-community/Kokoro-82M-bf16",
"voice": "af_heart",
"lang_code": "a",
"speed": 1.05
},
"kokoro": {
"path": "~/.openclaw/tools/kokoro",
"voice_blend": { "af_heart": 0.6, "af_sky": 0.4 },
"speed": 1.05
},
"processing": {
"subagent_timeout": 600,
"cleanup_temp_files": true
}
}
| Option | Default | Description |
|--------|---------|-------------|
| output.folder | ~/Documents/TubeScribe | Where to save files |
| output.open_folder_after | true | Open output folder when done |
| output.open_document_after | false | Auto-open generated document |
| output.open_audio_after | false | Auto-open generated audio summary |
| Option | Default | Values | Description |
|--------|---------|--------|-------------|
| document.format | docx | docx, html, md | Output format |
| document.engine | pandoc | pandoc | Converter for DOCX (falls back to HTML) |
| Option | Default | Values | Description |
|--------|---------|--------|-------------|
| audio.enabled | true | true, false | Generate audio summary |
| audio.format | mp3 | mp3, wav | Audio format (mp3 needs ffmpeg) |
| audio.tts_engine | mlx | mlx, kokoro, builtin | TTS engine (mlx = fastest on Apple Silicon) |
| Option | Default | Description |
|--------|---------|-------------|
| mlx_audio.path | ~/.openclaw/tools/mlx-audio | mlx-audio venv location |
| mlx_audio.model | mlx-community/Kokoro-82M-bf16 | MLX model to use |
| mlx_audio.voice | af_heart | Voice preset (used if no voice_blend) |
| mlx_audio.voice_blend | {af_heart: 0.6, af_sky: 0.4} | Custom voice mix (weighted blend) |
| mlx_audio.lang_code | a | Language code (a=US English) |
| mlx_audio.speed | 1.05 | Playback speed (1.0 = normal, 1.05 = 5% faster) |
| Option | Default | Description |
|--------|---------|-------------|
| kokoro.path | ~/.openclaw/tools/kokoro | Kokoro repo location |
| kokoro.voice_blend | {af_heart: 0.6, af_sky: 0.4} | Custom voice mix |
| kokoro.speed | 1.05 | Playback speed (1.0 = normal, 1.05 = 5% faster) |
| Option | Default | Description |
|--------|---------|-------------|
| processing.subagent_timeout | 600 | Seconds for sub-agent (increase for long videos) |
| processing.cleanup_temp_files | true | Remove /tmp files after completion |
| Option | Default | Description |
|--------|---------|-------------|
| comments.max_count | 50 | Number of comments to fetch |
| comments.timeout | 90 | Timeout for comment fetching (seconds) |
| Option | Default | Description |
|--------|---------|-------------|
| queue.stale_minutes | 30 | Consider a processing job stale after this many minutes |
~/Documents/TubeScribe/
āāā {Video Title}.html # Formatted document (or .docx / .md)
āāā {Video Title}_summary.mp3 # Audio summary (or .wav)
After generation, opens the folder (not individual files) so you can access everything.
Required:
summarize CLI ā brew install steipete/tap/summarizeOptional (better quality):
pandoc ā DOCX output: brew install pandocffmpeg ā MP3 audio: brew install ffmpegyt-dlp ā YouTube comments: brew install yt-dlppip install mlx-audio (uses MLX backend for Kokoro)TubeScribe checks these locations (in order):
| Priority | Path | Source |
|----------|------|--------|
| 1 | which yt-dlp | System PATH |
| 2 | /opt/homebrew/bin/yt-dlp | Homebrew (Apple Silicon) |
| 3 | /usr/local/bin/yt-dlp | Homebrew (Intel) / Linux |
| 4 | ~/.local/bin/yt-dlp | pip install --user |
| 5 | ~/.local/pipx/venvs/yt-dlp/bin/yt-dlp | pipx |
| 6 | ~/.openclaw/tools/yt-dlp/yt-dlp | TubeScribe auto-install |
If not found, setup downloads a standalone binary to the tools directory.
The tools directory version doesn't conflict with system installations.
When user sends multiple YouTube URLs while one is processing:
python skills/tubescribe/scripts/tubescribe.py --queue-status
# Add to queue instead of starting parallel processing
python skills/tubescribe/scripts/tubescribe.py --queue-add "NEW_URL"
# ā Replies: "š Added to queue (position 2)"
# Check if more in queue
python skills/tubescribe/scripts/tubescribe.py --queue-next
# ā Automatically pops and processes next URL
| Command | Description |
|---------|-------------|
| --queue-status | Show what's processing + queued items |
| --queue-add URL | Add URL to queue |
| --queue-next | Process next item from queue |
| --queue-clear | Clear entire queue |
python skills/tubescribe/scripts/tubescribe.py url1 url2 url3
Processes all URLs sequentially with a summary at the end.
The script detects and reports these errors with clear messages:
| Error | Message |
|-------|---------|
| Invalid URL | ā Not a valid YouTube URL |
| Private video | ā Video is private ā can't access |
| Video removed | ā Video not found or removed |
| No captions | ā No captions available for this video |
| Age-restricted | ā Age-restricted video ā can't access without login |
| Region-blocked | ā Video blocked in your region |
| Live stream | ā Live streams not supported ā wait until it ends |
| Network error | ā Network error ā check your connection |
| Timeout | ā Request timed out ā try again later |
When an error occurs, report it to the user and don't proceed with that video.
tubescribe url1 url2 url3Generated Mar 1, 2026
Researchers analyzing YouTube lectures or interviews can quickly generate transcripts with speaker labels and key quotes for citation and analysis. The audio summary allows for on-the-go review of content, enhancing productivity in literature reviews or data extraction.
Content creators repurposing YouTube videos into blog posts, podcasts, or social media snippets can use TubeScribe to extract structured transcripts and summaries. The formatted documents and audio output streamline content adaptation without manual transcription.
Organizations using YouTube tutorials or webinars for employee training can generate searchable transcripts and audio summaries for reference. This aids in knowledge retention and accessibility, especially for remote teams.
Journalists covering video interviews or press conferences can extract accurate quotes with timestamps for fact-checking and article writing. The sentiment analysis of comments provides additional context on public reception.
Providing accessible content for individuals with hearing impairments by converting YouTube videos into text transcripts and audio summaries. This supports compliance with accessibility standards and enhances inclusivity.
Offer a free basic version with limited features and a paid tier for advanced capabilities like batch processing, custom formatting, or API access. Revenue can come from monthly subscriptions or enterprise licenses.
License TubeScribe as a white-label solution for companies needing video summarization tools integrated into their platforms, such as e-learning systems or content management systems. Revenue is generated through licensing fees and support contracts.
Promote related tools like pandoc or ffmpeg through affiliate links within the setup or documentation. Revenue comes from commissions on sales generated from referrals, while keeping the core tool free.
š¬ Integration Tip
Ensure the summarize CLI is properly installed and configured before use; test with a sample YouTube URL to verify all dependencies are met for seamless operation.
Generate spectrograms and feature-panel visualizations from audio with the songsee CLI.
Best practices for Remotion - Video creation in React
Best practices for Remotion - Video creation in React
Long-form AI video production: the frontier of multi-agent coordination. CellCog orchestrates 6-7 foundation models to produce up to 4-minute videos from a single prompt ā scripted, filmed, voiced, lipsync'd, scored, and edited automatically. Create marketing videos, product demos, explainer videos, educational content, spokesperson videos, training materials, UGC content, news reports.
HeyGen AI video creation API. Use when: (1) Using Video Agent for one-shot prompt-to-video generation, (2) Generating AI avatar videos with /v2/video/generat...
Complete toolkit for programmatic video creation with Remotion + React. Covers animations, timing, rendering (CLI/Node.js/Lambda/Cloud Run), captions, 3D, charts, text effects, transitions, and media handling. Use when writing Remotion code, building video generation pipelines, or creating data-driven video templates.