audio-cogAI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI.
Install via ClawdBot CLI:
clawdbot install nitishgargiitd/audio-cogCreate professional audio with AI - from voiceovers and narration to background music and sound design.
This skill requires the cellcog skill for SDK setup and API calls.
clawhub install cellcog
Read the cellcog skill first for SDK setup. This skill shows you what's possible.
Quick pattern (v1.0+):
# Fire-and-forget - returns immediately
result = client.create_chat(
prompt="[your audio request]",
notify_session_key="agent:main:main",
task_label="audio-task",
chat_mode="agent" # Agent mode is optimal for all audio tasks
)
# Daemon notifies you when complete - do NOT poll
Convert text to natural-sounding speech:
CellCog provides 8 high-quality voices with distinct characteristics:
| Voice | Gender | Best For | Characteristics |
|-------|--------|----------|-----------------|
| cedar | Male | Product videos, announcements | Warm, resonant, authoritative, trustworthy |
| marin | Female | Professional content, tutorials | Bright, articulate, emotionally agile |
| ballad | Male | Storytelling, flowing narratives | Smooth, melodic, musical quality |
| coral | Female | Energetic content, ads | Vibrant, lively, dynamic, spirited |
| echo | Male | Thoughtful content, documentaries | Calm, measured, deliberate |
| sage | Female | Educational, knowledge content | Wise, contemplative, reflective |
| shimmer | Female | Gentle content, wellness | Soft, gentle, soothing, approachable |
| verse | Male | Creative, artistic content | Poetic, rhythmic, expressive |
For product videos and announcements:
Use cedar (male) or marin (female) - both project confidence and professionalism.
For storytelling and audiobooks:
Use ballad (male) or sage (female) - designed for engaging, flowing narratives.
For high-energy content:
Use coral (female) - vibrant and dynamic, perfect for ads and exciting announcements.
For calm, educational content:
Use echo (male) or shimmer (female) - measured pacing ideal for learning.
Beyond selecting a voice, you can fine-tune delivery with style instructions:
Example with style instructions:
"Generate voiceover using cedar voice with a warm, conversational tone. Speak at medium pace with slight enthusiasm when mentioning features. American accent."
Create original background music and soundtracks:
| Parameter | Options |
|-----------|---------|
| Duration | 15 seconds to 5+ minutes |
| Genre | Electronic, rock, classical, jazz, ambient, lo-fi, cinematic, pop, hip-hop |
| Tempo | 60 BPM (slow) to 180+ BPM (fast) |
| Mood | Upbeat, calm, dramatic, mysterious, inspiring, melancholic |
| Instruments | Piano, guitar, synth, strings, drums, brass, etc. |
All AI-generated music from CellCog is royalty-free and fully yours to use commercially.
You have complete rights to use the generated music for:
No attribution required. No licensing fees. The music is generated uniquely for you.
| Format | Best For |
|--------|----------|
| MP3 | Standard audio delivery, voiceovers, music |
| Combined with video | Background music for video-cog outputs |
Use chat_mode="agent" for all audio generation tasks.
Audio generation—whether voiceovers, music, or sound design—executes efficiently in agent mode. CellCog's audio capabilities don't require multi-angle deliberation; they require precise execution, which agent mode excels at.
There's no scenario where agent team mode provides meaningfully better audio output. Save agent team for research and complex creative work that benefits from multiple reasoning passes.
Professional voiceover with specific voice:
"Generate a professional voiceover using the marin voice for this script:
'Introducing TaskFlow - the project management tool that actually works. With intelligent automation, seamless collaboration, and powerful analytics, TaskFlow helps teams do their best work.'
Style: Confident and friendly, medium pace. Suitable for a product launch video."
Podcast intro with voice selection:
"Create a podcast intro voiceover using cedar voice:
'Welcome to Future Forward, the podcast where we explore the technologies shaping tomorrow. I'm your host, and today we're diving into...'
Style: Warm and engaging, conversational tone. Also generate a 10-second upbeat intro music bed to go underneath."
Background music:
"Generate 2 minutes of calm, lo-fi hip-hop style background music. Should be chill and unobtrusive, good for studying or working. Include soft piano, mellow beats, and gentle vinyl crackle. 75 BPM."
Audiobook narration:
"Create an audiobook-style narration using ballad voice for this passage:
[passage text]
Style: Warm storytelling quality, measured pace with appropriate pauses for drama."
Cinematic music:
"Generate 90 seconds of cinematic orchestral music for a tech company's 'About Us' video. Start soft and inspiring, build to a confident crescendo, then resolve to a hopeful ending."
CellCog can generate speech in 50+ languages:
Specify the language in your prompt:
"Generate this text in Japanese with a native female speaker using shimmer voice: 'いらっしゃいませ...'"
Generated Mar 1, 2026
Educational platforms can use the skill to generate clear, instructional voiceovers for training modules and audiobook-style narrations for courses. This automates audio production for scalable online learning materials.
Podcasters and media companies can create professional intros, jingles, and background music to enhance audio content. The royalty-free music generation supports monetized streaming without licensing issues.
Businesses can generate high-energy voiceovers for ads, product videos, and announcements using voices like cedar or coral. This speeds up campaign production with AI-driven audio assets.
Developers can create ambient soundtracks, background music, and voice prompts for apps and games. The skill supports custom durations and moods, ideal for immersive user experiences.
Companies can produce professional phone menu prompts and instructional voiceovers for internal training. This reduces costs by automating audio for customer service and employee onboarding.
Offer a subscription-based platform where users generate voiceovers and music for videos, podcasts, and ads. Revenue comes from tiered plans based on usage limits and premium features.
Freelancers or agencies use the skill to provide quick, low-cost audio creation services for clients in marketing, education, or entertainment. Charge per project or hourly for custom audio outputs.
Build a marketplace where creators sell AI-generated audio assets like background music or voiceovers. Take a commission on sales, leveraging the royalty-free licensing to attract buyers.
💬 Integration Tip
Install the cellcog dependency first and use chat_mode='agent' for efficient audio generation without polling.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.