UNITH Digital Humans Skill

Create, configure, update, and deploy AI-powered Digital Human avatars using the UNITH API.

Quick Overview

UNITH digital humans are AI avatars that can speak, converse, and interact with users. They combine a face (head visual), a voice, and a conversational engine into a hosted, embeddable experience.

Base API URL: https://platform-api.unith.ai

Docs: https://docs.unith.ai

Prerequisites

The user must supply the following credentials (stored as environment variables):

| Variable | Description | How to obtain |

|----------|-------------|---------------|

| UNITH_EMAIL | Account email | Register at https://unith.ai |

| UNITH_SECRET_KEY | Non-expiring secret key | UNITH dashboard → Manage Account → "Secret Key" section → Generate |

⚠️ The secret key is displayed only once. If lost, the user must delete and regenerate it.

Authentication

All API calls require a Bearer token (valid 7 days). Use the auth script:

source scripts/auth.sh

This validates credentials, retries on network errors, and exports UNITH_TOKEN. On failure, it prints specific guidance (wrong key, expired token, etc.).

Workflow: Creating a Digital Human

Step 1: Choose an Operating Mode

Ask the user what they want the digital human to do. Map their answer to one of 5 modes:

|------|----------------------|----------|--------|

Complexity spectrum (simple → sophisticated):

Simplest: ttt — just text in, video out. No knowledge base needed.
Standard: oc — conversational with a system prompt. Good for general assistants.
Knowledge-grounded: doc_qa — upload documents, avatar answers from them. Best for support/FAQ.
Workflow-driven: voiceflow — structured conversation paths. Requires Voiceflow account.
Most flexible: plugin — BYO conversational engine. Maximum control.

Step 2: List Available Faces

bash scripts/list-resources.sh faces

Each face has an id (used as headVisualId in creation). Faces can be:

Public: Available to all organizations
Private: Available only to the user's organization
Custom (BYOF): User uploads a video of a real person (currently managed by UNITH)

Present the available faces to the user and let them choose.

Step 3: List Available Voices

bash scripts/list-resources.sh voices

Voices come from providers: elevenlabs, azure, audiostack. Present options to the user. Voices have performance rankings — faster voices are better for real-time conversation.

Step 4: Create the Digital Human

Build a JSON payload file (see references/api-payloads.md for the schema per mode), then:

bash scripts/create-head.sh payload.json --dry-run   # validate first
bash scripts/create-head.sh payload.json              # create

The script validates required fields, checks mode-specific requirements, retries on server errors, and prints the publicUrl on success.

Step 5 (doc_qa only): Upload Knowledge Document

For doc_qa mode, the digital human needs a knowledge document:

bash scripts/upload-document.sh <headId> /path/to/document.pdf

The script checks file existence/size, uses a longer timeout for uploads, and provides guidance on next steps.

Step 6: Test and Iterate

The digital human is live at the publicUrl from Step 4. The user should:

Visit the URL and test the conversation
Update configuration as needed (see below)

Updating a Digital Human

Use the update script to modify any parameter except the face (changing face requires creating a new head):

bash scripts/update-head.sh <headId> updates.json                         # from a JSON file
bash scripts/update-head.sh <headId> --field ttsVoice=rachel              # single field
bash scripts/update-head.sh <headId> --field ttsVoice=rachel --field greetings="Hi!"  # multiple fields

Listing Existing Digital Humans

bash scripts/list-resources.sh heads           # list all
bash scripts/list-resources.sh head <headId>   # get details for one

Deleting a Digital Human

bash scripts/delete-head.sh <headId> --confirm     # always use --confirm in automated/agent contexts

This permanently removes the digital human and cannot be undone.

Agent note: Always pass --confirm when calling this script. Without it, the script prompts for interactive input and will hang.

Embedding

Digital humans can be embedded in websites/apps. See references/embedding.md for code snippets and configuration options.

Scripts

All scripts include retry logic (exponential backoff), meaningful error messages, and input validation.

| Script | Purpose |

|--------|---------|

| scripts/_utils.sh | Shared utilities: retry wrapper, colored logging, error parsing |

| scripts/auth.sh | Authenticate and export UNITH_TOKEN (with 6-day token caching) |

| scripts/list-resources.sh | List faces, voices, heads, languages, or get head details |

| scripts/create-head.sh | Create a digital human from a JSON payload file (with --dry-run validation) |

| scripts/update-head.sh | Update a digital human's configuration (JSON file or --field flags) |

| scripts/delete-head.sh | Delete a digital human (with confirmation prompt) |

| scripts/upload-document.sh | Upload knowledge document to a doc_qa head |

Configuration via environment variables:

UNITH_MAX_RETRIES — max retry attempts (default: 3)
UNITH_RETRY_DELAY — initial delay between retries in seconds (default: 2, doubles each retry)
UNITH_CURL_TIMEOUT — curl timeout in seconds (default: 30, 120 for uploads)
UNITH_CONNECT_TIMEOUT — connection timeout in seconds (default: 10)
UNITH_TOKEN_CACHE — token cache file path (default: /tmp/.unith_token_cache, set empty to disable)

Detailed API Reference

For full payload schemas, configuration parameters, and mode-specific details:

Read references/api-payloads.md      # Full request/response schemas per mode
Read references/configuration.md     # All configurable parameters
Read references/embedding.md         # Embedding code and options

Common Patterns

"I want a quick video of someone saying X" → ttt mode, minimal config

"I want a customer support avatar" → doc_qa mode with knowledge docs

"I want an AI sales rep" → oc mode with a sales personality prompt

"I want to connect my own LLM" → plugin mode with webhook URL

"I want a guided onboarding flow" → voiceflow mode with Voiceflow API key

Information to Collect from the User

Before creating, ask for:

Purpose / use case → determines operating mode
Face preference → list available faces for selection
Voice preference → language, accent, gender, speed priority
Alias → display name for the digital human
Language → speech recognition and UI language (e.g., en-US, es-ES)
Greeting message → initial message the avatar says
System prompt (for oc/doc_qa) → personality and behavior instructions
Knowledge documents (for doc_qa) → files to upload
Voiceflow API key (for voiceflow) → from their Voiceflow account
Plugin URL (for plugin) → webhook endpoint for their custom engine

Digital Clawatarv1.0.2

Install & Quick Start