yt-to-blogFull content pipeline: YouTube URL → transcript → blog post → Substack draft → X/Twitter thread → vertical video clips via HeyGen AI avatar. One URL in, entire content suite out. Use when asked to: "turn this video into content", "create a content suite from this YouTube video", "write a blog from this video", "repurpose this video", or any video-to-multi-platform content request. Can run the full pipeline or individual steps.
Install via ClawdBot CLI:
clawdbot install justinhartbiz/yt-to-blogYouTube URL → blog post + Substack + tweets + vertical video clips. The whole content machine.
YouTube URL
↓
① Transcript (summarize CLI)
↓
② Blog Draft (AI-written in your voice)
↓
③ Substack Publish (browser automation)
↓
④ X/Twitter Post (bird CLI)
↓
④b Facebook Group (optional reminder)
↓
⑤ Script Splitter (extract hook moments)
↓
⑥ HeyGen Videos (AI avatar vertical clips)
↓
⑦ Post-Processing (ffmpeg crop/scale)
↓
📁 Output Folder (blog.md, videos, tweet.txt, URLs)
One URL in → Five platforms out. Run the whole thing or any step individually.
Walk the user through this on first use. It takes ~10 minutes once, then never again.
Run the setup script to check what's installed:
bash skills/yt-content-engine/setup.sh
Required CLIs:
| Tool | Purpose | Install |
|------|---------|---------|
| summarize | YouTube transcript extraction | brew install steipete/tap/summarize |
| bird | X/Twitter posting | brew install steipete/tap/bird |
| ffmpeg | Video post-processing | brew install ffmpeg |
| curl | API calls to HeyGen | Usually pre-installed on macOS |
| python3 | Helper scripts | Usually pre-installed on macOS |
If anything is missing, tell the user what to install and wait for confirmation.
config.json (see config schema below).curl -s -H "X-Api-Key: API_KEY_HERE" https://api.heygen.com/v2/avatars | python3 -c "import sys,json; d=json.load(sys.stdin); print('✅ API key works!' if 'data' in d else '❌ Invalid key')"
Tell the user:
"For vertical video clips, you need a HeyGen avatar. Here's what matters:
>
Record in PORTRAIT mode (hold your phone vertically). This is critical — if you record landscape, the avatar will be a small strip in the center of a 9:16 frame and we'll need to crop/scale it (which works but loses quality).
>
Go to https://app.heygen.com/avatars → Create Instant Avatar → follow their recording guide. Stand in good lighting, look at camera, speak naturally for 2+ minutes.
>
Once created, grab your Avatar ID from the avatar details page."
List their existing avatars to help them pick. Note: the avatars endpoint returns both custom and stock avatars — filter for the user's custom ones (they typically appear first and have personal names):
curl -s -H "X-Api-Key: API_KEY" https://api.heygen.com/v2/avatars | python3 -c "
import sys, json
data = json.load(sys.stdin)
for a in data.get('data', {}).get('avatars', []):
print(f\" {a['avatar_id']} — {a.get('avatar_name', 'unnamed')}\")
"
Tell the user:
"Go to https://app.heygen.com/voice-clone → Clone your voice. Upload a clean audio sample (1-2 min of you speaking naturally). HeyGen will create a voice ID.
>
Once done, grab your Voice ID from the voice settings."
List their voices. User's cloned voices typically appear first; stock voices come after:
curl -s -H "X-Api-Key: API_KEY" https://api.heygen.com/v2/voices | python3 -c "
import sys, json
data = json.load(sys.stdin)
for v in data.get('data', {}).get('voices', []):
print(f\" {v['voice_id']} — {v.get('name', 'unnamed')} ({v.get('language', '?')})\")
"
⚠️ IMPORTANT: Use the FULL voice_id (e.g., 69da9c9bca78499b98fdac698d2a20cd), not a truncated version. The API will return "Voice validation failed" if you use a shortened ID.
Substack has no API — posting requires browser automation.
profile="openclaw"https://substack.com/sign-inconfig.jsonThe browser session persists across restarts. One-time setup.
Create skills/yt-content-engine/config.json (relative to your workspace):
{
"heygen": {
"apiKey": "YOUR_API_KEY",
"avatarId": "YOUR_AVATAR_ID",
"voiceId": "YOUR_VOICE_ID"
},
"substack": {
"publication": "yourblog.substack.com"
},
"twitter": {
"handle": "@yourhandle"
},
"author": {
"voice": "Description of your writing voice and style",
"name": "Your Name"
},
"video": {
"clipCount": 5,
"maxClipSeconds": 60,
"cropMode": "auto"
}
}
Tip: If the user already has a voice guide from the yt-to-blog skill, read it from skills/yt-to-blog/references/voice-guide.md and use it for the author.voice field.
Run the setup script with the config in place:
bash skills/yt-content-engine/setup.sh
It will test each component and report status.
"Turn this into a full content suite: https://youtu.be/XXXXX"
"Content engine this video: [URL]"
"Run the full pipeline on [URL]"
"Just get me the transcript from [URL]"
"Write a blog post from [URL]" (steps 1-2)
"Post this to Substack" (step 3, after blog exists)
"Tweet about this blog post" (step 4)
"Generate video clips from this blog" (steps 5-7)
"Just split this into scripts" (step 5 only)
Create the output directory for this run, then fetch the YouTube transcript:
mkdir -p /tmp/yt-content-engine/output-$(date +%Y-%m-%d)/scripts
mkdir -p /tmp/yt-content-engine/output-$(date +%Y-%m-%d)/videos
summarize "YOUTUBE_URL" --extract > /tmp/yt-content-engine/transcript.txt
The --extract flag prints the raw transcript without LLM summarization. Read the output. If it fails (no captions available), try with --youtube yt-dlp for auto-generated captions, or tell the user and suggest they provide a manual transcript.
Transform the transcript into a polished long-form blog post.
Load the author voice from config.json → author.voice. If a more detailed voice guide exists at skills/yt-to-blog/references/voice-guide.md, read and use that too.
Analysis phase — before writing, extract from the transcript:
Writing structure:
Writing rules:
Generate 3-5 headline options with distinct strategies (contrast/irony, revelation, moral framing, callback). Each with a subtitle. Let the user pick.
Save the final draft to the output folder as blog.md.
Post the blog to Substack via browser automation.
config.json → substack.publicationprofile="openclaw")https://PUBLICATION.substack.com/publish/postpbcopy < /tmp/post.md), paste into editor (Meta+v)Known issues:
—) may garble as ,Äî during clipboard paste → find/replace after pastehttps://PUBLICATION.substack.com/publishDefault: save as draft. Only publish if the user explicitly says "publish it" — always confirm first.
Save the Substack URL to output/substack-url.txt.
Compose and post using the bird CLI.
Compose the tweet/thread:
config.json → twitter.handlePost with bird:
# Single tweet
bird tweet "Your tweet text here"
# Thread (post first tweet, then reply to it)
bird tweet "Tweet 1 text here"
# Note the returned tweet ID, then:
bird reply TWEET_ID "Tweet 2 text here"
# And chain:
bird reply TWEET_2_ID "Tweet 3 text here"
Always show the user the tweet text before posting and get confirmation.
Save tweet text to output/tweet.txt.
If config.json includes a facebook.group URL, remind the user to post to their Facebook Group.
Note: Facebook Group API posting is heavily restricted. Browser automation is unreliable due to Facebook's anti-bot measures. Best approach:
output/facebook-post.txtThis keeps Facebook distribution in the workflow without fighting their API restrictions.
Extract 3-5 "hook moments" from the blog post and rewrite each as a spoken-word script for vertical video.
What to look for (scan the blog for these patterns):
Not every blog will have all five. Extract what's there. Minimum 3 clips.
Rewrite rules for spoken delivery:
Format each script:
CLIP 1: [descriptive title]
---
[Script text here, 75-150 words]
Use config.json → video.clipCount for the target number of clips (default: 5).
Use config.json → video.maxClipSeconds for max duration (default: 60).
Save scripts to output/scripts/clip-1.txt, clip-2.txt, etc.
Submit each script to HeyGen API v2 to generate AI avatar videos.
Read config:
# Parse config.json
API_KEY=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['apiKey'])")
AVATAR_ID=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['avatarId'])")
VOICE_ID=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['voiceId'])")
For each script, submit a video generation request:
curl -s -X POST "https://api.heygen.com/v2/video/generate" \
-H "X-Api-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"video_inputs": [{
"character": {
"type": "avatar",
"avatar_id": "'"$AVATAR_ID"'",
"avatar_style": "normal"
},
"voice": {
"type": "text",
"input_text": "'"$(cat output/scripts/clip-1.txt)"'",
"voice_id": "'"$VOICE_ID"'"
}
}],
"dimension": {
"width": 1080,
"height": 1920
}
}'
Parse the response to get video_id:
import json
response = json.loads(response_text)
video_id = response["data"]["video_id"]
Submit ALL clips before polling. HeyGen renders in parallel — submit all scripts first, collect all video_ids, then poll them all. This cuts total render time from N×3min to ~3min.
Poll for completion (every 15 seconds, timeout after 10 minutes):
curl -s -H "X-Api-Key: $API_KEY" \
"https://api.heygen.com/v1/video_status.get?video_id=$VIDEO_ID" \
| python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(d['status'], d.get('video_url',''))"
Statuses: pending → processing → completed (with video_url) or failed (with error).
Download completed videos:
curl -L -o "output/videos/clip-1-raw.mp4" "$VIDEO_URL"
Credit note: ~1 credit per 1 minute of video. A typical 5-clip run uses ~3 credits. Warn the user about credit usage before submitting.
If the avatar was recorded in landscape (common), the 9:16 video will show a small avatar strip centered in a large frame with background fill. Fix this with ffmpeg.
Check config.json → video.cropMode:
"auto" — detect and crop automatically"portrait" — skip cropping (avatar was recorded in portrait)"manual" — ask user for crop coordinatesAuto-crop pipeline:
# 1. Detect content bounds by scanning center column for non-background pixels
# Extract a single frame
ffmpeg -i input.mp4 -vframes 1 -y /tmp/frame.png
# 2. Use ffmpeg cropdetect to find content bounds
ffmpeg -i input.mp4 -vf "cropdetect=24:16:0" -frames:v 30 -f null - 2>&1 | grep cropdetect
# Parse the crop values from output: crop=W:H:X:Y
# 3. Crop content strip, scale up, center-crop to 1080x1920
ffmpeg -i input.mp4 \
-vf "crop=DETECTED_W:DETECTED_H:DETECTED_X:DETECTED_Y,scale=1080:-1,crop=1080:1920:0:(ih-1920)/2" \
-c:a copy \
-y output.mp4
Alternative manual detection (preferred — cropdetect often fails when background is white/light):
HeyGen typically renders landscape avatars centered on a white/light background in the 9:16 frame.
Scan the center column for non-white pixels to find the actual content strip:
# Extract a frame, then scan center column for content bounds
ffmpeg -y -ss 5 -i input.mp4 -frames:v 1 /tmp/frame.png 2>/dev/null
ffmpeg -y -i /tmp/frame.png -vf "crop=1:ih:iw/2:0,format=gray" -f rawvideo -pix_fmt gray - 2>/dev/null | \
python3 -c "
import sys
data = sys.stdin.buffer.read()
first = last = None
for i, b in enumerate(data):
if b < 240: # Non-white pixel = actual content
if first is None: first = i
last = i
if first is not None:
print(f'CONTENT_Y={first}')
print(f'CONTENT_HEIGHT={last - first}')
print(f'CENTER={( first + last) // 2}')
else:
print('No content bounds detected — avatar may already fill the frame')
"
Then crop the content strip, scale proportionally to fill width, and center-crop to 9:16:
ffmpeg -y -i input.mp4 \
-vf "crop=iw:CONTENT_HEIGHT:0:CONTENT_Y,scale=-1:1920,crop=1080:1920:(ow-1080)/2:0" \
-c:v libx264 -crf 23 -preset fast -c:a aac \
output.mp4
Proven crop values for common HeyGen landscape avatars (1080x1920 canvas):
crop=1080:607:0:656,scale=3413:1920,crop=1080:1920:1166:0Save processed videos to output/videos/clip-1.mp4, clip-2.mp4, etc.
If crop mode is portrait, just copy the raw files:
cp output/videos/clip-1-raw.mp4 output/videos/clip-1.mp4
Organize everything in a dated output folder:
output-YYYY-MM-DD/
├── blog.md # Final blog post
├── tweet.txt # Tweet text (posted or ready to post)
├── substack-url.txt # URL of Substack draft/post
├── scripts/
│ ├── clip-1.txt # Spoken word scripts
│ ├── clip-2.txt
│ └── ...
├── videos/
│ ├── clip-1.mp4 # Final processed vertical videos
│ ├── clip-2.mp4
│ └── ...
└── manifest.json # Run metadata
manifest.json:
{
"source": "https://youtu.be/XXXXX",
"date": "2026-02-03",
"blog": "blog.md",
"substackUrl": "https://...",
"tweetUrl": "https://...",
"clips": ["clip-1.mp4", "clip-2.mp4", "..."],
"heygenCreditsUsed": 3
}
Report the summary to the user:
Config file: skills/yt-content-engine/config.json (relative to workspace root)
| Key | Description | Default |
|-----|-------------|---------|
| heygen.apiKey | HeyGen API key | Required |
| heygen.avatarId | Your HeyGen avatar ID | Required |
| heygen.voiceId | Your cloned voice ID | Required |
| substack.publication | Substack subdomain | Required |
| twitter.handle | X/Twitter handle | Required |
| author.voice | Writing style description | Recommended |
| author.name | Author name for attribution | Recommended |
| video.clipCount | Number of clips to generate | 5 |
| video.maxClipSeconds | Max seconds per clip | 60 |
| video.cropMode | auto, portrait, or manual | auto |
bird auth to re-authenticate.cropMode to manual and eyeball the content bounds from a frame export.summarize "URL" --extract --youtube yt-dlp for auto-generated captions, or ask the user for a manual transcript.Generated Mar 1, 2026
A YouTuber who creates educational content wants to expand their reach by turning their long-form videos into blog posts for Substack, Twitter threads for engagement, and short vertical clips for TikTok/Instagram Reels. This allows them to monetize the same content across multiple platforms while maintaining consistent branding.
A digital marketing agency needs to efficiently repurpose client webinar recordings or explainer videos into comprehensive content suites. They can automate blog posts for client websites, social media posts for scheduled campaigns, and short video clips for paid advertising, saving significant production time.
A company's training department records internal training sessions and wants to transform them into written documentation for knowledge bases, email newsletters for employee updates, and bite-sized video reminders for mobile learning. This ensures consistent messaging across internal communication channels.
A podcast host wants to convert their audio/video episodes into written blog posts to improve SEO and reach audiences who prefer reading. They can simultaneously create social media threads to promote new episodes and generate short video clips to tease upcoming content on visual platforms.
A software company produces tutorial videos for their product and needs to transform them into help documentation, email nurture sequences for onboarding, and social media snippets for user engagement. The AI avatar allows them to create personalized video clips without additional filming.
Offer monthly retainer packages where clients provide YouTube videos, and you deliver complete content suites including blog posts, social media content, and video clips. Charge based on video length or output volume, with premium tiers for faster turnaround or additional platforms.
Package the skill as a self-service web application where users upload YouTube URLs and receive automated content outputs. Offer freemium tiers with limited processing and paid subscriptions for unlimited use, custom avatars, or priority processing.
License the technology to marketing agencies who can rebrand it as their own content repurposing service. Provide API access, custom integrations, and dedicated support while charging setup fees plus ongoing licensing based on usage volume.
💬 Integration Tip
Integrate with existing content calendars by exporting outputs to tools like Notion, Trello, or social media schedulers, and consider adding webhook notifications for pipeline completion.
Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.
Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.
Browse, search, post, and moderate Reddit. Read-only works without auth; posting/moderation requires OAuth setup.
Interact with Twitter/X — read tweets, search, post, like, retweet, and manage your timeline.
LinkedIn automation via browser relay or cookies for messaging, profile viewing, and network actions.
Search YouTube videos, get channel info, fetch video details and transcripts using YouTube Data API v3 via MCP server or yt-dlp fallback.