YT-to-Blog Content Engine

YouTube URL → blog post + Substack + tweets + vertical video clips. The whole content machine.

Pipeline Overview

YouTube URL
  ↓
① Transcript (summarize CLI)
  ↓
② Blog Draft (AI-written in your voice)
  ↓
③ Substack Publish (browser automation)
  ↓
④ X/Twitter Post (bird CLI)
  ↓
④b Facebook Group (optional reminder)
  ↓
⑤ Script Splitter (extract hook moments)
  ↓
⑥ HeyGen Videos (AI avatar vertical clips)
  ↓
⑦ Post-Processing (ffmpeg crop/scale)
  ↓
📁 Output Folder (blog.md, videos, tweet.txt, URLs)

One URL in → Five platforms out. Run the whole thing or any step individually.

First-Time Setup Wizard

Walk the user through this on first use. It takes ~10 minutes once, then never again.

Step 1: Check Dependencies

Run the setup script to check what's installed:

bash skills/yt-content-engine/setup.sh

Required CLIs:

| Tool | Purpose | Install |

|------|---------|---------|

| summarize | YouTube transcript extraction | brew install steipete/tap/summarize |

| bird | X/Twitter posting | brew install steipete/tap/bird |

| ffmpeg | Video post-processing | brew install ffmpeg |

| curl | API calls to HeyGen | Usually pre-installed on macOS |

| python3 | Helper scripts | Usually pre-installed on macOS |

If anything is missing, tell the user what to install and wait for confirmation.

Step 2: HeyGen API Key

Tell the user: "Go to https://app.heygen.com/settings — grab your API key from the API section."
If they don't have a HeyGen account: "Sign up at https://heygen.com — the free tier gives you a few credits to test with."
Save the key to config.json (see config schema below).
Test it:

curl -s -H "X-Api-Key: API_KEY_HERE" https://api.heygen.com/v2/avatars | python3 -c "import sys,json; d=json.load(sys.stdin); print('✅ API key works!' if 'data' in d else '❌ Invalid key')"

Step 3: HeyGen Avatar Setup

Tell the user:

"For vertical video clips, you need a HeyGen avatar. Here's what matters:

Record in PORTRAIT mode (hold your phone vertically). This is critical — if you record landscape, the avatar will be a small strip in the center of a 9:16 frame and we'll need to crop/scale it (which works but loses quality).

Go to https://app.heygen.com/avatars → Create Instant Avatar → follow their recording guide. Stand in good lighting, look at camera, speak naturally for 2+ minutes.

Once created, grab your Avatar ID from the avatar details page."

List their existing avatars to help them pick. Note: the avatars endpoint returns both custom and stock avatars — filter for the user's custom ones (they typically appear first and have personal names):

curl -s -H "X-Api-Key: API_KEY" https://api.heygen.com/v2/avatars | python3 -c "
import sys, json
data = json.load(sys.stdin)
for a in data.get('data', {}).get('avatars', []):
    print(f\"  {a['avatar_id']} — {a.get('avatar_name', 'unnamed')}\")
"

Step 4: HeyGen Voice Clone

Tell the user:

"Go to https://app.heygen.com/voice-clone → Clone your voice. Upload a clean audio sample (1-2 min of you speaking naturally). HeyGen will create a voice ID.

Once done, grab your Voice ID from the voice settings."

List their voices. User's cloned voices typically appear first; stock voices come after:

curl -s -H "X-Api-Key: API_KEY" https://api.heygen.com/v2/voices | python3 -c "
import sys, json
data = json.load(sys.stdin)
for v in data.get('data', {}).get('voices', []):
    print(f\"  {v['voice_id']} — {v.get('name', 'unnamed')} ({v.get('language', '?')})\")
"

⚠️ IMPORTANT: Use the FULL voice_id (e.g., 69da9c9bca78499b98fdac698d2a20cd), not a truncated version. The API will return "Voice validation failed" if you use a shortened ID.

Step 5: Substack Login

Substack has no API — posting requires browser automation.

Open the OpenClaw managed browser: use browser tool with profile="openclaw"
Navigate to https://substack.com/sign-in
Help the user log in with their credentials
Verify access by navigating to their publication dashboard
Save the publication URL to config.json

The browser session persists across restarts. One-time setup.

Step 6: Save Config

Create skills/yt-content-engine/config.json (relative to your workspace):

{
  "heygen": {
    "apiKey": "YOUR_API_KEY",
    "avatarId": "YOUR_AVATAR_ID",
    "voiceId": "YOUR_VOICE_ID"
  },
  "substack": {
    "publication": "yourblog.substack.com"
  },
  "twitter": {
    "handle": "@yourhandle"
  },
  "author": {
    "voice": "Description of your writing voice and style",
    "name": "Your Name"
  },
  "video": {
    "clipCount": 5,
    "maxClipSeconds": 60,
    "cropMode": "auto"
  }
}

Tip: If the user already has a voice guide from the yt-to-blog skill, read it from skills/yt-to-blog/references/voice-guide.md and use it for the author.voice field.

Step 7: Verify Everything

Run the setup script with the config in place:

bash skills/yt-content-engine/setup.sh

It will test each component and report status.

How to Invoke

Full Pipeline

"Turn this into a full content suite: https://youtu.be/XXXXX"
"Content engine this video: [URL]"
"Run the full pipeline on [URL]"

Individual Steps

"Just get me the transcript from [URL]"
"Write a blog post from [URL]" (steps 1-2)
"Post this to Substack" (step 3, after blog exists)
"Tweet about this blog post" (step 4)
"Generate video clips from this blog" (steps 5-7)
"Just split this into scripts" (step 5 only)

Pipeline Steps

Step ①: Transcript

Create the output directory for this run, then fetch the YouTube transcript:

mkdir -p /tmp/yt-content-engine/output-$(date +%Y-%m-%d)/scripts
mkdir -p /tmp/yt-content-engine/output-$(date +%Y-%m-%d)/videos

summarize "YOUTUBE_URL" --extract > /tmp/yt-content-engine/transcript.txt

The --extract flag prints the raw transcript without LLM summarization. Read the output. If it fails (no captions available), try with --youtube yt-dlp for auto-generated captions, or tell the user and suggest they provide a manual transcript.

Step ②: Blog Draft

Transform the transcript into a polished long-form blog post.

Load the author voice from config.json → author.voice. If a more detailed voice guide exists at skills/yt-to-blog/references/voice-guide.md, read and use that too.

Analysis phase — before writing, extract from the transcript:

Core thesis — the single strongest argument or revelation
Key data points — statistics, quotes, dates, names
Narrative moments — anecdotes, examples, scenes
Source links — URLs, studies, references mentioned
Missing context — what does the reader need that the video assumed?

Writing structure:

Cold open (1-3 paragraphs): Scene-setting. Specific, sensory, emotional hook before data.
Thesis pivot (1 paragraph): Connect scene to the bigger story.
Data body (5-15 paragraphs): Alternate data and editorial. Each stat gets a punch line. Subheadings for major breaks only.
Callback (1-2 paragraphs): Return to opening scene/metaphor.
Closing (3-6 short paragraphs): Escalating fragments. Final hammer line.

Writing rules:

Vary sentence length dramatically — long data sentences, then short punches
Em dashes for asides, not parentheses
Sentence fragments for emphasis
No bullet lists in the body — narrative flow
Inline source links, no footnotes
No "in conclusion" or "to summarize"
Credit video source naturally: "As [Name] put it..." with link
Target: 1,500-3,000 words

Generate 3-5 headline options with distinct strategies (contrast/irony, revelation, moral framing, callback). Each with a subtitle. Let the user pick.

Save the final draft to the output folder as blog.md.

Step ③: Substack Publish

Post the blog to Substack via browser automation.

Read config.json → substack.publication
Open managed browser (profile="openclaw")
Navigate to https://PUBLICATION.substack.com/publish/post
Click the title field, type the title
Click the subtitle area, type the subtitle
Click the body area
Write markdown to a temp file, copy to clipboard (pbcopy < /tmp/post.md), paste into editor (Meta+v)
Substack auto-saves as draft

Known issues:

Em dashes (—) may garble as ,Äî during clipboard paste → find/replace after paste
Large posts: pause briefly between paste and verification
Verify draft at https://PUBLICATION.substack.com/publish

Default: save as draft. Only publish if the user explicitly says "publish it" — always confirm first.

Save the Substack URL to output/substack-url.txt.

Step ④: X/Twitter Post

Compose and post using the bird CLI.

Compose the tweet/thread:

If the blog has a single killer hook → single tweet with link
If there are multiple strong points → thread (3-5 tweets)
Include the Substack URL
Match the author's voice but punchier — tweets are hooks, not summaries
Use the handle from config.json → twitter.handle

Post with bird:

# Single tweet
bird tweet "Your tweet text here"

# Thread (post first tweet, then reply to it)
bird tweet "Tweet 1 text here"
# Note the returned tweet ID, then:
bird reply TWEET_ID "Tweet 2 text here"
# And chain:
bird reply TWEET_2_ID "Tweet 3 text here"

Always show the user the tweet text before posting and get confirmation.

Save tweet text to output/tweet.txt.

Step ④b: Facebook Group (Optional)

If config.json includes a facebook.group URL, remind the user to post to their Facebook Group.

Note: Facebook Group API posting is heavily restricted. Browser automation is unreliable due to Facebook's anti-bot measures. Best approach:

Draft a Facebook post version of the content (shorter, more casual than tweet)
Save to output/facebook-post.txt
Remind the user: "Don't forget to post to [Group Name] — here's your draft"
User posts manually

This keeps Facebook distribution in the workflow without fighting their API restrictions.

Step ⑤: Script Splitter

Extract 3-5 "hook moments" from the blog post and rewrite each as a spoken-word script for vertical video.

What to look for (scan the blog for these patterns):

Hook/Controversy — the most provocative claim, the thing that makes people stop scrolling
Data Bomb — a surprising statistic or fact that reframes understanding
Counterintuitive Take — something that contradicts conventional wisdom
Emotional Moment — a story, anecdote, or human element that creates connection
Call-to-Action Closer — a rallying cry, challenge, or "what you should do now"

Not every blog will have all five. Extract what's there. Minimum 3 clips.

Rewrite rules for spoken delivery:

Hook first — open with the most attention-grabbing line. No preamble.
Conversational — write for speaking, not reading. Contractions, natural rhythm.
30-60 seconds each — roughly 75-150 words per clip
Self-contained — each clip must work on its own, no "as I mentioned earlier"
End with punch — close on the strongest line, not a trailing thought
No stage directions — just the words to speak, nothing else

Format each script:

CLIP 1: [descriptive title]
---
[Script text here, 75-150 words]

Use config.json → video.clipCount for the target number of clips (default: 5).

Use config.json → video.maxClipSeconds for max duration (default: 60).

Save scripts to output/scripts/clip-1.txt, clip-2.txt, etc.

Step ⑥: HeyGen Video Generation

Submit each script to HeyGen API v2 to generate AI avatar videos.

Read config:

# Parse config.json
API_KEY=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['apiKey'])")
AVATAR_ID=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['avatarId'])")
VOICE_ID=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['voiceId'])")

For each script, submit a video generation request:

curl -s -X POST "https://api.heygen.com/v2/video/generate" \
  -H "X-Api-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "video_inputs": [{
      "character": {
        "type": "avatar",
        "avatar_id": "'"$AVATAR_ID"'",
        "avatar_style": "normal"
      },
      "voice": {
        "type": "text",
        "input_text": "'"$(cat output/scripts/clip-1.txt)"'",
        "voice_id": "'"$VOICE_ID"'"
      }
    }],
    "dimension": {
      "width": 1080,
      "height": 1920
    }
  }'

Parse the response to get video_id:

import json
response = json.loads(response_text)
video_id = response["data"]["video_id"]

Submit ALL clips before polling. HeyGen renders in parallel — submit all scripts first, collect all video_ids, then poll them all. This cuts total render time from N×3min to ~3min.

Poll for completion (every 15 seconds, timeout after 10 minutes):

curl -s -H "X-Api-Key: $API_KEY" \
  "https://api.heygen.com/v1/video_status.get?video_id=$VIDEO_ID" \
  | python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(d['status'], d.get('video_url',''))"

Statuses: pending → processing → completed (with video_url) or failed (with error).

Download completed videos:

curl -L -o "output/videos/clip-1-raw.mp4" "$VIDEO_URL"

Credit note: ~1 credit per 1 minute of video. A typical 5-clip run uses ~3 credits. Warn the user about credit usage before submitting.

Step ⑦: Video Post-Processing

If the avatar was recorded in landscape (common), the 9:16 video will show a small avatar strip centered in a large frame with background fill. Fix this with ffmpeg.

Check config.json → video.cropMode:

"auto" — detect and crop automatically
"portrait" — skip cropping (avatar was recorded in portrait)
"manual" — ask user for crop coordinates

Auto-crop pipeline:

# 1. Detect content bounds by scanning center column for non-background pixels
# Extract a single frame
ffmpeg -i input.mp4 -vframes 1 -y /tmp/frame.png

# 2. Use ffmpeg cropdetect to find content bounds
ffmpeg -i input.mp4 -vf "cropdetect=24:16:0" -frames:v 30 -f null - 2>&1 | grep cropdetect

# Parse the crop values from output: crop=W:H:X:Y

# 3. Crop content strip, scale up, center-crop to 1080x1920
ffmpeg -i input.mp4 \
  -vf "crop=DETECTED_W:DETECTED_H:DETECTED_X:DETECTED_Y,scale=1080:-1,crop=1080:1920:0:(ih-1920)/2" \
  -c:a copy \
  -y output.mp4

Alternative manual detection (preferred — cropdetect often fails when background is white/light):

HeyGen typically renders landscape avatars centered on a white/light background in the 9:16 frame.

Scan the center column for non-white pixels to find the actual content strip:

# Extract a frame, then scan center column for content bounds
ffmpeg -y -ss 5 -i input.mp4 -frames:v 1 /tmp/frame.png 2>/dev/null

ffmpeg -y -i /tmp/frame.png -vf "crop=1:ih:iw/2:0,format=gray" -f rawvideo -pix_fmt gray - 2>/dev/null | \
  python3 -c "
import sys
data = sys.stdin.buffer.read()
first = last = None
for i, b in enumerate(data):
    if b < 240:  # Non-white pixel = actual content
        if first is None: first = i
        last = i
if first is not None:
    print(f'CONTENT_Y={first}')
    print(f'CONTENT_HEIGHT={last - first}')
    print(f'CENTER={( first + last) // 2}')
else:
    print('No content bounds detected — avatar may already fill the frame')
"

Then crop the content strip, scale proportionally to fill width, and center-crop to 9:16:

ffmpeg -y -i input.mp4 \
  -vf "crop=iw:CONTENT_HEIGHT:0:CONTENT_Y,scale=-1:1920,crop=1080:1920:(ow-1080)/2:0" \
  -c:v libx264 -crf 23 -preset fast -c:a aac \
  output.mp4

Proven crop values for common HeyGen landscape avatars (1080x1920 canvas):

Content strip typically at y≈656, height≈607px
Example: crop=1080:607:0:656,scale=3413:1920,crop=1080:1920:1166:0
Always detect per-video — avatar placement can shift

Save processed videos to output/videos/clip-1.mp4, clip-2.mp4, etc.

If crop mode is portrait, just copy the raw files:

cp output/videos/clip-1-raw.mp4 output/videos/clip-1.mp4

Step ⑧: Output

Organize everything in a dated output folder:

output-YYYY-MM-DD/
├── blog.md              # Final blog post
├── tweet.txt            # Tweet text (posted or ready to post)
├── substack-url.txt     # URL of Substack draft/post
├── scripts/
│   ├── clip-1.txt       # Spoken word scripts
│   ├── clip-2.txt
│   └── ...
├── videos/
│   ├── clip-1.mp4       # Final processed vertical videos
│   ├── clip-2.mp4
│   └── ...
└── manifest.json        # Run metadata

manifest.json:

{
  "source": "https://youtu.be/XXXXX",
  "date": "2026-02-03",
  "blog": "blog.md",
  "substackUrl": "https://...",
  "tweetUrl": "https://...",
  "clips": ["clip-1.mp4", "clip-2.mp4", "..."],
  "heygenCreditsUsed": 3
}

Report the summary to the user:

✅ Blog post: X words
✅ Substack: [URL] (draft/published)
✅ Tweet: posted / ready to post
✅ X video clips generated and processed
💰 HeyGen credits used: ~X

Config Reference

Config file: skills/yt-content-engine/config.json (relative to workspace root)

| Key | Description | Default |

|-----|-------------|---------|

| heygen.apiKey | HeyGen API key | Required |

| heygen.avatarId | Your HeyGen avatar ID | Required |

| heygen.voiceId | Your cloned voice ID | Required |

| substack.publication | Substack subdomain | Required |

| twitter.handle | X/Twitter handle | Required |

| author.voice | Writing style description | Recommended |

| author.name | Author name for attribution | Recommended |

| video.clipCount | Number of clips to generate | 5 |

| video.maxClipSeconds | Max seconds per clip | 60 |

| video.cropMode | auto, portrait, or manual | auto |

Tips & Troubleshooting

HeyGen rendering takes 2-3 min per clip. Set expectations — a 5-clip run takes 10-15 minutes of render time.
Portrait avatars save time. No cropping needed. Worth re-recording if you use this regularly.
Substack session expires? Re-run the browser login step (Step 5 of setup).
bird CLI not posting? Run bird auth to re-authenticate.
Bad crop detection? Switch cropMode to manual and eyeball the content bounds from a frame export.
HeyGen quota errors? Check credits at https://app.heygen.com/settings — upgrade plan or reduce clip count.
Transcript unavailable? Some videos don't have captions. Try summarize "URL" --extract --youtube yt-dlp for auto-generated captions, or ask the user for a manual transcript.

YT-to-Blog Content Enginev2.0.0

Install & Quick Start