animaAnima Avatar - Interactive Video Generation Engine. Generates 16:9 videos with dynamic character sprites (Shutiao), synced audio (Fish Audio), and text overlay.
Install via ClawdBot CLI:
clawdbot install HMyaoyuan/animaGenerates high-quality interactive videos where Shutiao speaks the text with appropriate expressions, gestures, and voice.
src/director.js: The core engine. Generates frames (sharp + SVG), audio (Fish Audio), and video (FFmpeg).src/send_video_pro.js: Delivery script. Handles transcoding, duration calculation, and Feishu upload.src/batch_generator.js: Batch sprite generator. Uses Gemini image generation to produce sprite variants.assets/sprites/: The sprite library (1920x1080 PNG files).assets/production_plan.csv: The asset registry (25 sprites).assets/manifest.json: Sprite metadata for reference.output/: Generated videos.ClawHub only distributes text files. The sprite PNG images are not included in the published package.
After installing, follow the steps below in order to prepare your sprites before first use.
All image generation steps use Gemini API (Nano Banana) as the AI image generator. It works by "reference image + text prompt" â you give it an existing image and a text description of what to change, and it returns a new image with the changes applied. This is how both the base sprite (character + background fusion) and all expression variants are created.
You need a standalone character illustration (transparent background PNG recommended).
Save it somewhere accessible (e.g. avatars/my_character.png).
You need a background scene for the character to stand in.
Save it at: assets/backgrounds/ (e.g. assets/backgrounds/cherry_blossom_bg.png).
This step uses Gemini (Nano Banana) image generation to merge your character onto the background. The AI sees both images and creates a natural-looking composite â this is NOT a simple overlay/paste, but an AI-generated fusion that handles lighting, shadows, and blending.
How to do it:
Method A: Use Gemini directly (recommended)
Use any Gemini-compatible image generation tool (like Nano Banana, Google AI Studio, or the Gemini API) with:
Save the output as: assets/sprites/shutiao_base.png
Method B: Use the built-in compose script (simple overlay)
If you just want a quick mechanical overlay (no AI blending), src/compose_base.js can paste your character onto the background using sharp:
src/compose_base.js â update BG_PATH and AVATAR_PATH to point to your files.node src/compose_base.jsassets/sprites/shutiao_base.pngNote: Method B is a plain image composite. Method A (Gemini) produces much better results because it handles lighting and integration naturally.
Now that you have a base sprite, plan what expression/pose variants you want.
Open assets/production_plan.csv and customize it:
ID,Emotion,Variant,Description,Filename,Prompt,Status
001,Base,v1,Standard,shutiao_base.png,gentle smile looking at viewer,Done
003,Happy,v1,Smile,shutiao_happy.png,big happy smile eyes closed,Pending
007,Angry,v1,Pout,shutiao_angry.png,angry face pouting,Pending
...
Column meanings:
shutiao__.png format.Pending = will be generated. Done = already exists, skip.The default CSV has 25 entries. You can add, remove, or modify rows freely.
This step uses Gemini (Nano Banana) image generation again. For each Pending row, the batch generator sends your base sprite + the prompt to Gemini, asking: "Same image, change facial expression to [prompt]. Keep clothes and background exactly same."
skills/anima/.env:GEMINI_API_KEY=your_key_here
assets/sprites/shutiao_base.png (or shutiao_base_1k.png) exists from Step 3.node skills/anima/src/batch_generator.js
What happens:
production_plan.csvStatus=Pendingassets/sprites/Status=DoneCheck that assets/sprites/ now has a PNG file for every row in production_plan.csv:
ls assets/sprites/*.png | wc -l
Then do a quick test run:
node skills/anima/run.js --preview --script '[{"text":"Test","emotion":"Happy"}]'
Check the generated frame at temp/frame_0.png â you should see your character with the text overlay.
If a sprite is missing at runtime, the director will fall back to a white background with a warning in the console.
brew install ffmpegsudo apt install ffmpegInstall inside the skill folder:
cd skills/anima
npm install
The only native dependency is sharp, which ships prebuilt binaries for all major platforms via N-API. It does not need recompilation when Node versions change â install once, run everywhere.
This skill depends on two external services. You need to provide your own API keys.
src/director.js (the generateAudio() function).FISH_AUDIO_KEY â Your API key (starts with sk-... or a hex string).FISH_AUDIO_REF_ID â The voice model reference ID. You can use Fish Audio's default models or clone your own voice.src/batch_generator.js (only needed if you want to create new sprite variants).batch_generator.js calls the Gemini API directly via curl.GEMINI_API_KEYsrc/send_video_pro.js.FEISHU_APP_ID â Your Feishu app ID.FEISHU_APP_SECRET â Your Feishu app secret.--preview mode.Create a .env file inside the skill folder (skills/anima/.env):
# Fish Audio (Required for TTS)
FISH_AUDIO_KEY=your_key_here
FISH_AUDIO_REF_ID=your_model_ref_id_here
# Gemini (Optional, for sprite generation)
GEMINI_API_KEY=your_key_here
# Feishu/Lark (Optional, for delivery)
FEISHU_APP_ID=cli_...
FEISHU_APP_SECRET=...
Important: The .env file is loaded from the skill folder first (least-privilege). Never commit .env files â the .clawignore already excludes it.
# Basic usage (Demo script)
node skills/anima/run.js --target "ou_..."
# With custom script (JSON string)
node skills/anima/run.js --target "ou_..." --script '[{"text":"Hello World","emotion":"Happy"}]'
# With custom script (File)
node skills/anima/run.js --target "ou_..." --script "path/to/script.json"
# Preview only (No upload)
node skills/anima/run.js --script '[{"text":"Test","emotion":"Happy"}]' --preview
node skills/anima/run.js --target "<open_id>" --script '[{"text":"Hello","emotion":"Happy"}]'
Each scene in the script is a JSON object:
[
{ "text": "Hello boss!", "emotion": "Happy" },
{ "text": "Let me think...", "emotion": "Think" },
{ "text": "I got it!", "emotion": "Action" }
]
Available emotions: Base, Happy, Angry, Shy, Think, Sad, Action.
To use a different TTS provider (e.g., OpenAI, ElevenLabs):
src/director.js.generateAudio(text, filename) function.{ path: "/path/to/audio.wav", duration: 1.5 } (duration in seconds).To add new expressions or poses after the initial setup:
assets/production_plan.csv with Status=Pending.angry expression, arms crossed, looking away).node src/batch_generator.js â it will only process Pending rows.loadSprites().See ASSETS_PLAN.md for the full production matrix and design philosophy.
send_video_pro.js calculates duration in ms and passes it to both upload and message payload.ffmpeg transcoding logs and verify source frame images in temp/frame_*.png.sudo apt install fonts-noto-cjk).FISH_AUDIO_KEY is missing, the skill falls back to macOS say command (English only).Generated Mar 1, 2026
Educators and content creators can generate animated explainer videos with a custom avatar that narrates lessons or tutorials. The dynamic sprites allow the avatar to show expressions like thinking or excitement, making complex topics more engaging and easier to understand for students.
Companies can produce interactive training videos where an animated spokesperson delivers compliance or onboarding materials. The AI-synced audio and expressive sprites help maintain viewer attention and convey professionalism, with videos uploaded directly to platforms like Feishu for easy employee access.
Brands can create promotional videos featuring a mascot or character that speaks about products or services. The ability to generate multiple sprite variants (e.g., happy, action poses) enables dynamic storytelling, enhancing brand identity and customer engagement across social media or websites.
Writers and animators can produce short animated stories or interactive narratives with characters that speak dialogue. The engine's parallel rendering and audio-sync capabilities allow for efficient production of high-quality videos, ideal for web series, gaming content, or fan creations.
Businesses can develop helpdesk videos where an avatar guides users through troubleshooting steps or software features. The realistic voice synthesis and emotion-based sprites (e.g., thinking, shy) make instructions clearer and more relatable, improving user experience and reducing support calls.
Offer a cloud-based platform where users pay a monthly fee to access the video generation engine with limited sprite libraries and rendering credits. Include premium tiers for higher-quality audio, more sprites, and faster processing, targeting small businesses and individual creators.
Provide a free version with basic features and watermarked videos, while charging for advanced options like custom sprite generation, ad-free exports, and integration with platforms like Feishu. Monetize through in-app purchases or one-time upgrades for professional users.
Sell customized licenses to large organizations for internal use in training, marketing, or communications. Include dedicated support, API access for bulk video generation, and integration with corporate systems, with pricing based on usage volume and features.
đŹ Integration Tip
Ensure you have a Gemini API key and high-quality character/background images ready before starting; the sprite generation process requires careful planning in the CSV file to avoid errors.
Generate/edit images with Nano Banana Pro (Gemini 3 Pro Image). Use for image create/modify requests incl. edits. Supports text-to-image + image-to-image; 1K/2K/4K; use --input-image.
Capture frames or clips from RTSP/ONVIF cameras.
Batch-generate images via OpenAI Images API. Random prompt sampler + `index.html` gallery.
Generate images using the internal Google Antigravity API (Gemini 3 Pro Image). High quality, native generation without browser automation.
äœżçšć çœź image_generate.py èæŹçæćŸç, ć〿ž æ°ć ·äœç `prompt`ă
AI image generation powered by CellCog. Create images, edit photos, consistent characters, product photography, reference-based images, sets of images, style transfer. Professional image creation with AI.