llm-eval-routerShadow-test local Ollama models against a cloud baseline with a multi-judge ensemble. Automatically promotes models when statistically proven equivalent — re...
Install via ClawdBot CLI:
clawdbot install nissan/llm-eval-routerGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://github.com/reddinft/skill-llm-eval-routerAudited Apr 16, 2026 · audit v1.0
Generated Mar 1, 2026
A fintech startup uses the skill to evaluate local models for summarizing quarterly financial reports, comparing them against Claude as a baseline. After collecting 200+ runs, they promote a local model to handle routine summaries, reducing API costs by 80% while maintaining quality through continuous monitoring.
An e-commerce company employs the skill to test local models for classifying customer support tickets into categories like refunds or technical issues. They use the multi-judge ensemble to ensure accuracy, promoting a model after it proves equivalent to cloud models, cutting down on expensive API calls for high-volume ticket processing.
A legal tech firm uses the skill to evaluate local models for analyzing contract clauses, with ground truth from Claude. They apply per-task weight overrides for analyze tasks to prioritize semantic similarity, ensuring reliable promotion of models that match cloud quality for non-critical legal reviews, saving on API expenses.
A social media platform integrates the skill to test local models for filtering inappropriate content, using cloud models as a baseline. After statistical validation, they promote a local model to handle initial filtering, reducing latency and costs while maintaining safety through demotion triggers if quality drops.
A healthcare analytics company uses the skill to evaluate local models for extracting structured data from patient notes, with ground truth from Anthropic. They leverage the deterministic validators for every run and promote models after meeting the 0.95 mean score threshold, enabling cost-effective data processing while ensuring compliance through local inference.
Offer the skill as a cloud-based service with tiered pricing based on usage volume, providing automated model evaluation and routing for enterprises. Revenue comes from monthly subscriptions, with premium tiers including advanced analytics and custom task type configurations.
Sell licenses for on-premise deployment to organizations with strict data privacy requirements, such as government or healthcare. Revenue is generated through one-time license fees and annual support contracts, with optional add-ons for integration with existing AI infrastructure.
Provide consulting services to help companies implement and customize the skill for specific use cases, such as optimizing task weights or integrating with local Ollama models. Revenue comes from project-based fees and ongoing maintenance agreements, targeting businesses new to AI cost optimization.
💬 Integration Tip
Ensure Ollama is running with capable models and set up per-task weight overrides in config files to align with specific evaluation needs, such as reducing structural weight for analyze tasks.
Scored Apr 19, 2026
Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Gemini CLI for one-shot Q&A, summaries, and generation.
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates openclaw.json. Use when the user mentions free AI, OpenRouter, model switching, rate limits, or wants to reduce AI costs.
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates opencla...
Reduce OpenClaw AI costs by 97%. Haiku model routing, free Ollama heartbeats, prompt caching, and budget controls. Go from $1,500/month to $50/month in 5 min...
HTML-first PDF production skill for reports, papers, and structured documents. Must be applied before generating PDF deliverables from HTML.