clawd-throttleRoutes LLM requests to the cheapest capable model across 8 providers (Anthropic, Google, OpenAI, DeepSeek, xAI, Moonshot, Mistral, Ollama) and 25+ models. Scores prompts on 8 dimensions in under 1ms, supports three routing modes (eco, standard, gigachad), and logs all decisions for cost tracking.
Install via ClawdBot CLI:
clawdbot install liekzejaws/clawd-throttleSetup Clawd Throttle (API keys + routing mode):
Setup Clawd Throttle (API keys + routing mode)Requires:
Route every LLM request to the cheapest model that can handle it. Stop
paying Opus prices for "hello" and "summarize this."
Supports 8 providers and 25+ models: Anthropic (Claude), Google
(Gemini), OpenAI (GPT / o-series), xAI (Grok), DeepSeek, Moonshot (Kimi),
Mistral, and Ollama (local).
reasoning markers, simplicity indicators, multi-step patterns, question
count, system prompt complexity, conversation depth) in under 1 millisecond
model based on your active mode and configured providers
| Mode | Simple | Standard | Complex |
|------|--------|----------|---------|
| eco | Grok 4.1 Fast | Gemini Flash | Haiku |
| standard | Grok 4.1 Fast | Haiku | Sonnet |
| gigachad | Haiku | Sonnet | Opus 4.6 |
Each cell shows the first-choice model. The router tries a preference list
and falls through to the next available provider if the first is not
configured.
| Command | What It Does |
|---------|-------------|
| route_request | Send a prompt and get a response from the cheapest capable model |
| classify_prompt | Analyze prompt complexity without making an LLM call |
| get_routing_stats | View cost savings and model distribution stats |
| get_config | View current configuration (keys redacted) |
| set_mode | Change routing mode at runtime |
| get_recent_routing_log | Inspect recent routing decisions |
/opus, /sonnet, /haiku, /flash, or /grok-fast to force a specific model
npm run setup
Generated Mar 1, 2026
Handles routine inquiries and FAQ responses using simple models like Grok 4.1 Fast in eco mode, reducing costs while maintaining quality. For complex issues, it automatically routes to higher-tier models like Haiku or Sonnet, ensuring accurate resolutions without manual intervention.
Processes large volumes of articles by classifying prompts for simplicity and routing to cost-effective models like Gemini Flash. For in-depth analysis or multi-step reasoning, it escalates to models like Sonnet, optimizing speed and expense across varied content types.
Analyzes code snippets and bug reports using the classifier to detect complexity markers, routing simple checks to cheaper models and intricate logic problems to advanced ones. Supports overrides for specific models when developers need targeted expertise.
Routes student questions based on complexity, using eco mode for basic queries and gigachad mode for advanced topics. Logs decisions to track costs and model performance, enabling scalable, affordable personalized learning support.
Classifies patient inquiries to route simple informational requests to low-cost models and complex medical reasoning to higher-tier models. Ensures privacy by hashing prompts and keeping data local, suitable for sensitive health applications.
Offers tiered pricing based on routing modes (eco, standard, gigachad) and usage volume, with premium features like advanced logging and override capabilities. Targets businesses seeking to optimize LLM costs without sacrificing performance.
Provides custom setup, API key configuration, and mode optimization for enterprises integrating Clawd Throttle into existing workflows. Generates revenue through project-based fees and ongoing support contracts.
Licenses the routing technology to other AI platforms or developers, allowing them to rebrand and offer cost-efficient LLM access. Revenue comes from licensing fees and a share of savings passed to end-users.
💬 Integration Tip
Ensure required API keys are set up first, and use the setup script to configure routing modes based on your cost and performance needs.
Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Gemini CLI for one-shot Q&A, summaries, and generation.
Research any topic from the last 30 days on Reddit + X + Web, synthesize findings, and write copy-paste-ready prompts. Use when the user wants recent social/web research on a topic, asks "what are people saying about X", or wants to learn current best practices. Requires OPENAI_API_KEY and/or XAI_API_KEY for full Reddit+X access, falls back to web search.
Check Antigravity account quotas for Claude and Gemini models. Shows remaining quota and reset times with ban detection.
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates opencla...
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates openclaw.json. Use when the user mentions free AI, OpenRouter, model switching, rate limits, or wants to reduce AI costs.