llm-supervisorGraceful rate limit handling with Ollama fallback. Notifies on rate limits, offers local model switch with confirmation for code tasks.
Install via ClawdBot CLI:
clawdbot install dhardie/llm-supervisorHandles rate limits and model fallbacks gracefully.
When I encounter rate limits or overload errors from cloud providers (Anthropic, OpenAI):
Before using local models for code generation, ask:
"Cloud is rate-limited. Switch to local Ollama (qwen2.5:7b)? Reply 'yes' to confirm."
For simple queries (chat, summaries), can switch without confirmation if user previously approved.
/llm statusReport current state:
/llm switch localManually switch to Ollama for the session.
/llm switch cloudSwitch back to cloud provider.
# Check available models
ollama list
# Run a query
ollama run qwen2.5:7b "your prompt here"
# For longer prompts, use stdin
echo "your prompt" | ollama run qwen2.5:7b
Check with ollama list. Configured default: qwen2.5:7b
Track in memory during session:
currentProvider: "cloud" | "local" lastRateLimitAt: timestamp or nulllocalConfirmedForCode: booleanReset to cloud at session start.
Generated Mar 1, 2026
A development team uses the LLM Supervisor to handle API rate limits from cloud providers like OpenAI during code generation tasks. When rate limits are hit, the agent notifies the team and offers a fallback to a local Ollama model, ensuring uninterrupted workflow without compromising on code quality by requiring confirmation for switches.
Researchers rely on cloud LLMs for data analysis and summarization but face rate limits during peak usage. The LLM Supervisor automatically switches to local Ollama for non-code tasks like text summarization after user approval, maintaining productivity while managing costs and API constraints effectively.
A company uses AI agents for customer support chats, which depend on cloud LLMs for responses. During high traffic, rate limits can disrupt service. The LLM Supervisor detects limits, notifies operators, and can switch to local models for simple queries, ensuring continuous support with minimal downtime.
Content creators use AI for generating articles and social media posts, often hitting API rate limits. The LLM Supervisor offers a fallback to local Ollama models for non-code tasks like drafting, allowing the agency to maintain output without extra costs or delays, with manual switches for code-related content.
Offer the LLM Supervisor as a cloud-based service with tiered pricing based on usage limits and features like advanced monitoring. Revenue comes from monthly subscriptions, targeting businesses that rely heavily on AI APIs and need reliable fallback options to avoid disruptions.
Sell perpetual licenses or annual contracts to large organizations for on-premise deployment, including customization and support services. This model caters to industries with strict data privacy requirements, ensuring local fallback without external dependencies.
Provide a free version with basic rate limit handling and Ollama integration, then charge for advanced features like detailed analytics, multi-provider support, and priority support. This attracts individual developers and small teams, converting them to paid plans as needs grow.
💬 Integration Tip
Ensure Ollama is installed and configured with default models like qwen2.5:7b before deployment, and use the /llm status command to monitor provider states during integration testing.
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Search and analyze your own session logs (older/parent conversations) using jq.
Typed knowledge graph for structured agent memory and composable skills. Use when creating/querying entities (Person, Project, Task, Event, Document), linking related objects, enforcing constraints, planning multi-step actions as graph transformations, or when skills need to share state. Trigger on "remember", "what do I know about", "link X to Y", "show dependencies", entity CRUD, or cross-skill data access.
Ultimate AI agent memory system for Cursor, Claude, ChatGPT & Copilot. WAL protocol + vector search + git-notes + cloud backup. Never lose context again. Vibe-coding ready.
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection