openclaw-aisa-llm-gatewayUnified LLM Gateway - One API for 70+ AI models. Route to GPT, Claude, Gemini, Qwen, Deepseek, Grok and more with a single API key.
Install via ClawdBot CLI:
clawdbot install 0xjordansg-yolo/openclaw-aisa-llm-gatewayUnified LLM Gateway for autonomous agents. Powered by AIsa.
One API key. 70+ models. OpenAI-compatible.
Replace 100+ API keys with one. Access GPT-4, Claude-3, Gemini, Qwen, Deepseek, Grok, and more through a unified, OpenAI-compatible API.
"Chat with GPT-4 for reasoning, switch to Claude for creative writing"
"Compare responses from GPT-4, Claude, and Gemini for the same question"
"Analyze this image with GPT-4o - what objects are in it?"
"Route simple queries to fast/cheap models, complex queries to GPT-4"
"If GPT-4 fails, automatically try Claude, then Gemini"
| Feature | LLM Router | Direct APIs |
|---------|------------|-------------|
| API Keys | 1 | 10+ |
| SDK Compatibility | OpenAI SDK | Multiple SDKs |
| Billing | Unified | Per-provider |
| Model Switching | Change string | Code rewrite |
| Fallback Routing | Built-in | DIY |
| Cost Tracking | Unified | Fragmented |
| Family | Developer | Example Models |
|--------|-----------|----------------|
| GPT | OpenAI | gpt-4.1, gpt-4o, gpt-4o-mini, o1, o1-mini, o3-mini |
| Claude | Anthropic | claude-3-5-sonnet, claude-3-opus, claude-3-sonnet |
| Gemini | Google | gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash |
| Qwen | Alibaba | qwen-max, qwen-plus, qwen2.5-72b-instruct |
| Deepseek | Deepseek | deepseek-chat, deepseek-coder, deepseek-v3, deepseek-r1 |
| Grok | xAI | grok-2, grok-beta |
Note: Model availability may vary. Check marketplace.aisa.one/pricing for the full list of currently available models and pricing.
export AISA_API_KEY="your-key"
POST https://api.aisa.one/v1/chat/completions
curl -X POST "https://api.aisa.one/v1/chat/completions" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"temperature": 0.7,
"max_tokens": 1000
}'
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| model | string | Yes | Model identifier (e.g., gpt-4.1, claude-3-sonnet) |
| messages | array | Yes | Conversation messages |
| temperature | number | No | Randomness (0-2, default: 1) |
| max_tokens | integer | No | Maximum response tokens |
| stream | boolean | No | Enable streaming (default: false) |
| top_p | number | No | Nucleus sampling (0-1) |
| frequency_penalty | number | No | Frequency penalty (-2 to 2) |
| presence_penalty | number | No | Presence penalty (-2 to 2) |
| stop | string/array | No | Stop sequences |
{
"role": "user|assistant|system",
"content": "message text or array for multimodal"
}
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "gpt-4.1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 50,
"completion_tokens": 200,
"total_tokens": 250,
"cost": 0.0025
}
}
curl -X POST "https://api.aisa.one/v1/chat/completions" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-sonnet",
"messages": [{"role": "user", "content": "Write a poem about AI."}],
"stream": true
}'
Streaming returns Server-Sent Events (SSE):
data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":"In"}}]}
data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":" circuits"}}]}
...
data: [DONE]
Analyze images by passing image URLs or base64 data:
curl -X POST "https://api.aisa.one/v1/chat/completions" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}
]
}'
Enable tools/functions for structured outputs:
curl -X POST "https://api.aisa.one/v1/chat/completions" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [{"role": "user", "content": "What is the weather in Tokyo?"}],
"functions": [
{
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
],
"function_call": "auto"
}'
For Gemini models, you can also use the native format:
POST https://api.aisa.one/v1/models/{model}:generateContent
curl -X POST "https://api.aisa.one/v1/models/gemini-2.0-flash:generateContent" \
-H "Authorization: Bearer $AISA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [
{
"role": "user",
"parts": [{"text": "Explain machine learning."}]
}
],
"generationConfig": {
"temperature": 0.7,
"maxOutputTokens": 1000
}
}'
No installation required - uses standard library only.
# Basic completion
python3 {baseDir}/scripts/llm_router_client.py chat --model gpt-4.1 --message "Hello, world!"
# With system prompt
python3 {baseDir}/scripts/llm_router_client.py chat --model claude-3-sonnet --system "You are a poet" --message "Write about the moon"
# Streaming
python3 {baseDir}/scripts/llm_router_client.py chat --model gpt-4o --message "Tell me a story" --stream
# Multi-turn conversation
python3 {baseDir}/scripts/llm_router_client.py chat --model qwen-max --messages '[{"role":"user","content":"Hi"},{"role":"assistant","content":"Hello!"},{"role":"user","content":"How are you?"}]'
# Vision analysis
python3 {baseDir}/scripts/llm_router_client.py vision --model gpt-4o --image "https://example.com/image.jpg" --prompt "Describe this image"
# List supported models
python3 {baseDir}/scripts/llm_router_client.py models
# Compare models
python3 {baseDir}/scripts/llm_router_client.py compare --models "gpt-4.1,claude-3-sonnet,gemini-2.0-flash" --message "What is 2+2?"
from llm_router_client import LLMRouterClient
client = LLMRouterClient() # Uses AISA_API_KEY env var
# Simple chat
response = client.chat(
model="gpt-4.1",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response["choices"][0]["message"]["content"])
# With options
response = client.chat(
model="claude-3-sonnet",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain relativity."}
],
temperature=0.7,
max_tokens=500
)
# Streaming
for chunk in client.chat_stream(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a story."}]
):
print(chunk, end="", flush=True)
# Vision
response = client.vision(
model="gpt-4o",
image_url="https://example.com/image.jpg",
prompt="What's in this image?"
)
# Compare models
results = client.compare_models(
models=["gpt-4.1", "claude-3-sonnet", "gemini-2.0-flash"],
message="Explain quantum computing"
)
for model, result in results.items():
print(f"{model}: {result['response'][:100]}...")
Use cheaper models for simple tasks:
def smart_route(message: str) -> str:
# Simple queries -> fast/cheap model
if len(message) < 50:
model = "gpt-3.5-turbo"
# Complex reasoning -> powerful model
else:
model = "gpt-4.1"
return client.chat(model=model, messages=[{"role": "user", "content": message}])
Automatic fallback on failure:
def chat_with_fallback(message: str) -> str:
models = ["gpt-4.1", "claude-3-sonnet", "gemini-2.0-flash"]
for model in models:
try:
return client.chat(model=model, messages=[{"role": "user", "content": message}])
except Exception:
continue
raise Exception("All models failed")
Compare model outputs:
results = client.compare_models(
models=["gpt-4.1", "claude-3-opus"],
message="Analyze this quarterly report..."
)
# Log for analysis
for model, result in results.items():
log_response(model=model, latency=result["latency"], cost=result["cost"])
Choose the best model for each task:
MODEL_MAP = {
"code": "deepseek-coder",
"creative": "claude-3-opus",
"fast": "gpt-3.5-turbo",
"vision": "gpt-4o",
"chinese": "qwen-max",
"reasoning": "gpt-4.1"
}
def route_by_task(task_type: str, message: str) -> str:
model = MODEL_MAP.get(task_type, "gpt-4.1")
return client.chat(model=model, messages=[{"role": "user", "content": message}])
Errors return JSON with error field:
{
"error": {
"code": "model_not_found",
"message": "Model 'xyz' is not available"
}
}
Common error codes:
401 - Invalid or missing API key402 - Insufficient credits404 - Model not found429 - Rate limit exceeded500 - Server errorJust change the base URL and key:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["AISA_API_KEY"],
base_url="https://api.aisa.one/v1"
)
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Token-based pricing varies by model. Check marketplace.aisa.one/pricing for current rates.
| Model Family | Approximate Cost |
|--------------|------------------|
| GPT-4.1 / GPT-4o | ~$0.01 / 1K tokens |
| Claude-3-Sonnet | ~$0.01 / 1K tokens |
| Gemini-2.0-Flash | ~$0.001 / 1K tokens |
| Qwen-Max | ~$0.005 / 1K tokens |
| DeepSeek-V3 | ~$0.002 / 1K tokens |
Every response includes usage.cost and usage.credits_remaining.
export AISA_API_KEY="your-key"See API Reference for complete endpoint documentation.
Generated Mar 1, 2026
Develop a unified chat interface that allows users to switch between different AI models like GPT-4 for technical queries and Claude for creative tasks, enhancing user experience by leveraging each model's strengths without managing multiple API keys. This is ideal for customer support or content creation platforms seeking flexibility and cost-efficiency.
Create a tool that generates and compares responses from multiple AI models (e.g., GPT-4, Gemini, Qwen) for the same prompt, helping businesses evaluate model performance for tasks like marketing copy or research summaries. This supports decision-making in industries reliant on AI-generated content.
Build a service that uses AI models like GPT-4o to analyze images from user uploads or URLs, identifying objects, extracting text, or providing insights for applications in e-commerce, healthcare, or security. The unified API simplifies integration and fallback to other models if needed.
Implement an intelligent routing system that directs simple queries to cheaper models like Gemini Flash and complex ones to advanced models like GPT-4, optimizing operational costs for startups or enterprises using AI at scale. This reduces billing fragmentation and improves resource allocation.
Develop an assistant with built-in fallback logic, automatically switching from a primary model (e.g., GPT-4) to alternatives like Claude or Deepseek in case of failures, ensuring high availability for critical applications in customer service or education.
Offer a subscription-based platform where users pay a monthly fee for access to the unified LLM gateway, with tiered plans based on usage limits or premium models like GPT-4.1. This generates recurring revenue while simplifying billing for clients.
Charge customers based on token usage across all models, with unified cost tracking through the API. This appeals to developers and businesses needing flexible, scalable AI access without upfront commitments, driving revenue from high-volume users.
License the LLM router technology to large enterprises for internal use or resale, providing custom branding, support, and integration services. This targets industries like finance or healthcare seeking secure, tailored AI solutions.
💬 Integration Tip
Start by setting the AISA_API_KEY environment variable and using the OpenAI-compatible endpoint for quick testing with curl or Python SDKs.
Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Gemini CLI for one-shot Q&A, summaries, and generation.
Research any topic from the last 30 days on Reddit + X + Web, synthesize findings, and write copy-paste-ready prompts. Use when the user wants recent social/web research on a topic, asks "what are people saying about X", or wants to learn current best practices. Requires OPENAI_API_KEY and/or XAI_API_KEY for full Reddit+X access, falls back to web search.
Check Antigravity account quotas for Claude and Gemini models. Shows remaining quota and reset times with ban detection.
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates opencla...
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates openclaw.json. Use when the user mentions free AI, OpenRouter, model switching, rate limits, or wants to reduce AI costs.