llmrouterIntelligent LLM proxy that routes requests to appropriate models based on complexity. Save money by using cheaper models for simple tasks. Tested with Anthropic, OpenAI, Gemini, Kimi/Moonshot, and Ollama.
Install via ClawdBot CLI:
clawdbot install alexrudloff/llmrouterAn intelligent proxy that classifies incoming requests by complexity and routes them to appropriate LLM models. Use cheaper/faster models for simple tasks and reserve expensive models for complex ones.
Works with OpenClaw to reduce token usage and API costs by routing simple requests to smaller models.
Status: Tested with Anthropic, OpenAI, Google Gemini, Kimi/Moonshot, and Ollama.
# Clone if not already present
git clone https://github.com/alexrudloff/llmrouter.git
cd llmrouter
# Create virtual environment (required on modern Python)
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Pull classifier model (if using local classification)
ollama pull qwen2.5:3b
# Copy and customize config
cp config.yaml.example config.yaml
# Edit config.yaml with your API key and model preferences
# Start the server
source venv/bin/activate
python server.py
# In another terminal, test health endpoint
curl http://localhost:4001/health
# Should return: {"status": "ok", ...}
python server.py
Options:
--port PORT - Port to listen on (default: 4001)--host HOST - Host to bind (default: 127.0.0.1)--config PATH - Config file path (default: config.yaml)--log - Enable verbose logging--openclaw - Enable OpenClaw compatibility (rewrites model name in system prompt)Edit config.yaml to customize:
# Anthropic routing
models:
super_easy: "anthropic:claude-haiku-4-5-20251001"
easy: "anthropic:claude-haiku-4-5-20251001"
medium: "anthropic:claude-sonnet-4-20250514"
hard: "anthropic:claude-opus-4-20250514"
super_hard: "anthropic:claude-opus-4-20250514"
# OpenAI routing
models:
super_easy: "openai:gpt-4o-mini"
easy: "openai:gpt-4o-mini"
medium: "openai:gpt-4o"
hard: "openai:o3-mini"
super_hard: "openai:o3"
# Google Gemini routing
models:
super_easy: "google:gemini-2.0-flash"
easy: "google:gemini-2.0-flash"
medium: "google:gemini-2.0-flash"
hard: "google:gemini-2.0-flash"
super_hard: "google:gemini-2.0-flash"
Note: Reasoning models are auto-detected and use correct API params.
Three options for classifying request complexity:
Local (default) - Free, requires Ollama:
classifier:
provider: "local"
model: "qwen2.5:3b"
Anthropic - Uses Haiku, fast and cheap:
classifier:
provider: "anthropic"
model: "claude-haiku-4-5-20251001"
OpenAI - Uses GPT-4o-mini:
classifier:
provider: "openai"
model: "gpt-4o-mini"
Google - Uses Gemini:
classifier:
provider: "google"
model: "gemini-2.0-flash"
Kimi - Uses Moonshot:
classifier:
provider: "kimi"
model: "moonshot-v1-8k"
Use remote (anthropic/openai/google/kimi) if your machine can't run local models.
anthropic:claude-* - Anthropic Claude models (tested)openai:gpt-, openai:o1-, openai:o3-* - OpenAI models (tested)google:gemini-* - Google Gemini models (tested)kimi:kimi-k2.5, kimi:moonshot-* - Kimi/Moonshot models (tested)local:model-name - Local Ollama models (tested)| Level | Use Case | Default Model |
|-------|----------|---------------|
| super_easy | Greetings, acknowledgments | Haiku |
| easy | Simple Q&A, reminders | Haiku |
| medium | Coding, emails, research | Sonnet |
| hard | Complex reasoning, debugging | Opus |
| super_hard | System architecture, proofs | Opus |
Edit ROUTES.md to tune how messages are classified. The classifier reads the table in this file to determine complexity levels.
The router exposes an OpenAI-compatible API:
curl http://localhost:4001/v1/chat/completions \
-H "Authorization: Bearer $ANTHROPIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llm-router",
"messages": [{"role": "user", "content": "Hello!"}]
}'
python classifier.py "Write a Python sort function"
# Output: medium
python classifier.py --test
# Runs test suite
Create ~/Library/LaunchAgents/com.llmrouter.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.llmrouter</string>
<key>ProgramArguments</key>
<array>
<string>/path/to/llmrouter/venv/bin/python</string>
<string>/path/to/llmrouter/server.py</string>
<string>--openclaw</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>WorkingDirectory</key>
<string>/path/to/llmrouter</string>
<key>StandardOutPath</key>
<string>/path/to/llmrouter/logs/stdout.log</string>
<key>StandardErrorPath</key>
<string>/path/to/llmrouter/logs/stderr.log</string>
</dict>
</plist>
Important: Replace /path/to/llmrouter with your actual install path. Must use the venv python, not system python.
# Create logs directory
mkdir -p ~/path/to/llmrouter/logs
# Load the service
launchctl load ~/Library/LaunchAgents/com.llmrouter.plist
# Verify it's running
curl http://localhost:4001/health
# To stop/restart
launchctl unload ~/Library/LaunchAgents/com.llmrouter.plist
launchctl load ~/Library/LaunchAgents/com.llmrouter.plist
Add the router as a provider in ~/.openclaw/openclaw.json:
{
"models": {
"providers": {
"localrouter": {
"baseUrl": "http://localhost:4001/v1",
"apiKey": "via-router",
"api": "openai-completions",
"models": [
{
"id": "llm-router",
"name": "LLM Router (Auto-routes by complexity)",
"reasoning": false,
"input": ["text", "image"],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 200000,
"maxTokens": 8192
}
]
}
}
}
}
Note: Cost is set to 0 because actual costs depend on which model the router selects. The router logs which model handled each request.
To use the router for all agents by default, add:
{
"agents": {
"defaults": {
"model": {
"primary": "localrouter/llm-router"
}
}
}
}
If your config.yaml uses an Anthropic OAuth token from OpenClaw's ~/.openclaw/auth-profiles.json, the router automatically handles Claude Code identity headers.
If using with OpenClaw, you MUST start the server with --openclaw:
python server.py --openclaw
This flag enables compatibility features required for OpenClaw:
Without this flag, you may encounter errors when using the router with OpenClaw.
curl http://localhost:4001/healthcat config.yamlpython classifier.py "your message"python classifier.py --testpython server.py againtail -f logs/stdout.logPython 3.11+ requires virtual environments. Create one:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Server isn't running. Start it:
source venv/bin/activate && python server.py
Edit ROUTES.md to tune classification rules. The classifier reads this file to determine complexity levels.
Ensure Ollama is running and the model is pulled:
ollama serve # Start Ollama if not running
ollama pull qwen2.5:3b
Ensure your token in config.yaml starts with sk-ant-oat. The router auto-detects OAuth tokens and adds required identity headers.
Check logs and ensure paths are absolute:
cat ~/Library/LaunchAgents/com.llmrouter.plist # Verify paths
cat /path/to/llmrouter/logs/stderr.log # Check for errors
Generated Mar 1, 2026
A company uses the LLM router to classify incoming customer queries by complexity, routing simple FAQs to cheaper models like Haiku or GPT-4o-mini, while directing technical issues to more capable models like Claude Opus. This reduces API costs by up to 70% while maintaining response quality for complex cases.
A social media platform employs the router to screen user-generated content, using local Ollama models for basic profanity detection and lightweight classification, then escalating nuanced hate speech or legal concerns to premium models. This balances cost-efficiency with accuracy in high-stakes moderation.
An online learning platform integrates the router to assess student questions, sending basic math problems to free local models and routing advanced physics or coding queries to OpenAI's o3-mini. This enables scalable, personalized tutoring without overspending on simple interactions.
A telehealth app uses the router to classify patient descriptions, with common symptoms handled by fast, low-cost models and complex medical histories forwarded to high-accuracy models for preliminary analysis. It ensures reliable triage while controlling operational expenses in healthcare services.
A fintech firm applies the router to process financial queries, using Gemini Flash for routine data summaries and Claude Sonnet for in-depth risk assessment or regulatory compliance checks. This optimizes model usage across varying analytical depths in finance workflows.
Offer the LLM router as a managed cloud service with tiered pricing based on request volume and model usage, targeting startups and enterprises seeking cost-effective AI routing. Revenue streams include monthly subscriptions and pay-per-request fees, with potential upsells for premium support.
Sell enterprise licenses for self-hosted deployments, providing customization, security compliance, and integration support for large organizations in regulated industries like finance or healthcare. Revenue comes from one-time license fees and annual maintenance contracts, ensuring long-term client relationships.
Provide consulting services to help businesses implement and optimize the router within their existing AI stacks, including configuration tuning, performance monitoring, and workflow automation. Revenue is generated through project-based fees and ongoing retainer agreements for technical support.
💬 Integration Tip
Start with a local classifier using Ollama to minimize costs, then gradually integrate remote providers like Anthropic for higher accuracy in production environments.
Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Gemini CLI for one-shot Q&A, summaries, and generation.
Research any topic from the last 30 days on Reddit + X + Web, synthesize findings, and write copy-paste-ready prompts. Use when the user wants recent social/web research on a topic, asks "what are people saying about X", or wants to learn current best practices. Requires OPENAI_API_KEY and/or XAI_API_KEY for full Reddit+X access, falls back to web search.
Check Antigravity account quotas for Claude and Gemini models. Shows remaining quota and reset times with ban detection.
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates opencla...
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates openclaw.json. Use when the user mentions free AI, OpenRouter, model switching, rate limits, or wants to reduce AI costs.