ai-podcast-pipelineCreate Korean AI podcast packages from QuickView trend notes. Use for dual-host script writing (Callie ร Nick), Gemini multi-speaker TTS audio generation, subtitle timing/render fixes, thumbnail+MP4 packaging, and YouTube title/description output. Supports both full (15~20 min) and compressed (5~7 min) editions.
Install via ClawdBot CLI:
clawdbot install jeong-wooseok/ai-podcast-pipelineThis skill may trigger antivirus false positives due to legitimate use of:
GEMINI_API_KEY)All code is open source and auditable in this repository. No malicious behavior.
Build end-to-end podcast assets from Trend/QuickView-* content.
Prefer weekly QuickView file from your configured Quartz root.
If user gives wk.aiee.app URL, map to local Quartz markdown first.
Read and apply:
references/podcast_prompt_template_ko.mdModes:
Rules:
archive/# Set API key via environment (required)
export GEMINI_API_KEY="<YOUR_KEY>"
# Run from skills/ai-podcast-pipeline/
python3 scripts/build_dualvoice_audio.py \
--input <script.txt> \
--outdir <outdir> \
--basename podcast_full_dualvoice \
--chunk-lines 6
python3 scripts/gemini_multispeaker_tts.py \
--input-file <dialogue.txt> \
--outdir <outdir> \
--basename podcast_dualvoice \
--retries 3 \
--timeout-seconds 120
Default voice mapping (2026-02-10 fixed):
KorePuckOutput: MP3 (default delivery format)
Use full-text subtitle builder (no ... truncation):
python3 scripts/build_korean_srt.py \
--script <script.txt> \
--audio <final.mp3> \
--output <outdir>/podcast.srt \
--max-chars 22
Use renderer with adjustable font and timing shift:
python3 scripts/render_subtitled_video.py \
--image <thumbnail.png> \
--audio <final.mp3> \
--srt <podcast.srt> \
--output <outdir>/final.mp4 \
--font-name "Do Hyeon" \
--font-size 27 \
--shift-ms -250
Notes:
shift-ms negative = subtitle earlier (for lag fixes)font-size (e.g., 25~27)# Set API key via environment (required)
export GEMINI_API_KEY="<YOUR_KEY>"
python3 scripts/build_podcast_assets.py \
--source "<QuickView path or URL>"
Reference (layout/copy guardrails):
references/thumbnail_guidelines_ko.mdAlways include:
build_dualvoice_audio.py)--shift-ms (usually -150 to -300)GEMINI_API_KEY), not hardcoded.references/podcast_prompt_template_ko.mdreferences/workflow_runbook.mdreferences/thumbnail_guidelines_ko.mdAI Usage Analysis
Analysis is being generatedโฆ refresh in a few seconds.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.