youtube-voice-summarizer-elevenlabsTransform YouTube videos into podcast-style voice summaries using ElevenLabs TTS
Install via ClawdBot CLI:
clawdbot install Franciscoandsam/youtube-voice-summarizer-elevenlabsTransform any YouTube video into a professional voice summary delivered in under 60 seconds.
When a user sends a YouTube URL, this skill:
This skill requires a running backend server. Deploy the summarizer service:
git clone https://github.com/Franciscomoney/elevenlabs-moltbot.git
cd elevenlabs-moltbot
npm install
cp .env.example .env
# Add your API keys to .env
npm start
| Service | Purpose | Get Key |
|---------|---------|---------|
| ElevenLabs | Text-to-speech | https://elevenlabs.io |
| Supadata | YouTube transcripts | https://supadata.ai |
| OpenRouter | AI summarization | https://openrouter.ai |
When user sends a YouTube URL:
curl -s -X POST http://127.0.0.1:3050/api/summarize \
-H "Content-Type: application/json" \
-d '{"url":"YOUTUBE_URL","length":"short","voice":"podcast"}'
Returns: {"jobId": "job_xxx", "status": "processing"}
curl -s http://127.0.0.1:3050/api/status/JOB_ID
Keep polling until status is "completed".
When complete, the response includes:
result.audioUrl - The MP3 audio URL (send this to the user!)result.teaser - Short hook text about the contentresult.summary - Full text summaryresult.keyPoints - Array of key takeawaysSend the user:
| Voice | Style |
|-------|-------|
| podcast | Deep male narrator (default) |
| news | British authoritative |
| casual | Friendly conversational |
| female_warm | Warm female voice |
| Length | Duration | Best For |
|--------|----------|----------|
| short | 1-2 min | Quick overview |
| medium | 3-5 min | Balanced detail |
| detailed | 5-10 min | Comprehensive |
User: "Summarize this: https://www.youtube.com/watch?v=dQw4w9WgXcQ"
curl -s -X POST http://127.0.0.1:3050/api/summarize \
-H "Content-Type: application/json" \
-d '{"url":"https://www.youtube.com/watch?v=dQw4w9WgXcQ","length":"short","voice":"podcast"}'
For faster, cheaper text-only summaries:
curl -s -X POST http://127.0.0.1:3050/api/quick-summary \
-H "Content-Type: application/json" \
-d '{"url":"YOUTUBE_URL","length":"short"}'
"Video may not have captions"
Audio URL not working
| Service | Cost |
|---------|------|
| Supadata | ~$0.001 |
| OpenRouter | ~$0.005-0.02 |
| ElevenLabs | ~$0.05-0.15 |
| Total | ~$0.06-0.17 |
Generated Mar 1, 2026
Students can quickly grasp key points from lengthy educational YouTube lectures or tutorials by generating concise audio summaries. This helps in efficient study sessions and revision, especially for auditory learners who benefit from listening to content.
Business professionals can summarize competitor product demos, industry webinars, or market trend videos into brief audio reports. This saves time during research phases and allows teams to stay updated without watching full videos.
This skill converts video content into audio summaries, making YouTube videos more accessible for individuals with visual impairments. It provides an alternative way to consume information through natural-sounding speech output.
Content creators and social media managers can generate quick audio teasers or summaries from YouTube videos to repurpose content for podcasts, promotional clips, or audience engagement. It streamlines workflow by automating summarization and voiceover tasks.
Companies can use this skill to summarize training videos or onboarding materials into short audio clips for employees. This facilitates quick learning and retention, especially for remote teams or those with busy schedules.
Offer a limited number of free summaries per month with basic voices, then charge a monthly subscription for unlimited access, premium voices, and longer summary lengths. This model attracts users with free trials and monetizes heavy usage.
Provide API access to developers and businesses who integrate the summarization service into their own applications, charging per summary based on length and voice options. This scales with usage and targets tech-savvy customers.
Sell customized enterprise packages to companies for internal use, such as in education or corporate training, with bulk pricing, dedicated support, and integration assistance. This model focuses on high-value, long-term contracts.
💬 Integration Tip
Ensure the backend server is properly configured with all required API keys and publicly accessible to handle user requests seamlessly. Test with various YouTube URLs to confirm transcript availability and audio output quality.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.