sota-tracker-mcpProvides daily updated authoritative data and APIs tracking state-of-the-art AI models across categories from LMArena, Artificial Analysis, and HuggingFace.
Install via ClawdBot CLI:
clawdbot install romancircus/sota-tracker-mcpThe definitive open-source database of State-of-the-Art AI models.
Auto-updated daily from LMArena, Artificial Analysis, and HuggingFace.
AI models are released weekly. Keeping track is impossible. This project:
# Latest data (updated daily)
curl -O https://raw.githubusercontent.com/romancircus/sota-tracker-mcp/main/data/sota_export.json
curl -O https://raw.githubusercontent.com/romancircus/sota-tracker-mcp/main/data/sota_export.csv
git clone https://github.com/romancircus/sota-tracker-mcp.git
cd sota-tracker-mcp
# Query with sqlite3
sqlite3 data/sota.db "SELECT name, sota_rank FROM models WHERE category='llm_api' ORDER BY sota_rank LIMIT 10"
# List forbidden/outdated models
sqlite3 data/sota.db "SELECT name, reason, replacement FROM forbidden"
The recommended approach for Claude Code users is static file embedding (lower token cost than MCP):
# Set up daily auto-update of CLAUDE.md
cp scripts/update_sota_claude_md.py ~/scripts/
# Enable systemd timer (runs at 6 AM daily)
systemctl --user enable --now sota-update.timer
# Or run manually
python ~/scripts/update_sota_claude_md.py --update
This embeds a compact SOTA summary directly in your ~/.claude/CLAUDE.md file.
# Start the API server
uvicorn rest_api:app --host 0.0.0.0 --port 8000
# Query endpoints
curl "http://localhost:8000/api/v1/models?category=llm_api"
curl "http://localhost:8000/api/v1/forbidden"
curl "http://localhost:8000/api/v1/models/FLUX.1-dev/freshness"
MCP support is available but disabled by default (higher token cost). To enable:
# Edit .mcp.json to add the server config
cat > .mcp.json << 'EOF'
{
"mcpServers": {
"sota-tracker": {
"command": "python",
"args": ["server.py"]
}
}
}
EOF
| Source | Data | Update Frequency |
|--------|------|------------------|
| LMArena | LLM Elo rankings (6M+ human votes) | Daily |
| Artificial Analysis | LLM benchmarks, pricing, speed | Daily |
| HuggingFace | Model downloads, trending | Daily |
| Manual curation | Video, Image, Audio, Video2Audio models | As needed |
| Category | Description | Top Models (Feb 2026) |
|----------|-------------|----------------------|
| llm_api | Cloud LLM APIs | Gemini 3 Pro, Grok 4.1, Claude Opus 4.5 |
| llm_local | Local LLMs (GGUF) | Qwen3, Llama 3.3, DeepSeek-V3 |
| llm_coding | Code-focused LLMs | Qwen3-Coder, DeepSeek-V3 |
| image_gen | Image generation | Z-Image-Turbo, FLUX.2-dev, Qwen-Image |
| video | Video generation | LTX-2, Wan 2.2, HunyuanVideo 1.5 |
| video2audio | Video-to-audio (foley) | MMAudio V2 Large |
| tts | Text-to-speech | ChatterboxTTS, F5-TTS |
| stt | Speech-to-text | Whisper Large v3 |
| embeddings | Vector embeddings | BGE-M3 |
| Endpoint | Description |
|----------|-------------|
| GET /api/v1/models?category=X | Get SOTA for a category |
| GET /api/v1/models/:name/freshness | Check if model is current or outdated |
| GET /api/v1/forbidden | List outdated models to avoid |
| GET /api/v1/compare?model_a=X&model_b=Y | Compare two models |
| GET /api/v1/recent?days=30 | Models released in past N days |
| GET /api/v1/recommend?task=chat | Get recommendation for a task |
| GET /health | Health check |
# Install dependencies
pip install -r requirements.txt
pip install playwright
playwright install chromium
# Run all scrapers
python scrapers/run_all.py --export
# Output:
# data/sota_export.json
# data/sota_export.csv
# data/lmarena_latest.json
This repo uses GitHub Actions to:
To enable on your fork:
sota-tracker-mcp/
βββ server.py # MCP server (optional)
βββ rest_api.py # REST API server
βββ init_db.py # Database initialization + seeding
βββ requirements.txt # Dependencies
βββ data/
β βββ sota.db # SQLite database
β βββ sota_export.json # Full JSON export
β βββ sota_export.csv # CSV export
β βββ forbidden.json # Outdated models list
βββ scrapers/
β βββ lmarena.py # LMArena scraper (Playwright)
β βββ artificial_analysis.py # AA scraper (Playwright)
β βββ run_all.py # Unified runner
βββ fetchers/
β βββ huggingface.py # HuggingFace API
β βββ cache_manager.py # Smart caching
βββ .github/workflows/
βββ daily-scrape.yml # GitHub Actions workflow
Found a model that's missing or incorrectly ranked?
init_db.py and submit a PRscrapers/run_all.pySee CONTRIBUTING.md for full developer setup and PR process.
The repo now supports updating agents.md files for OpenCode agents:
# Update your agents.md with latest SOTA data
python update_agents_md.py
# Minimal version (top 1 model per category, lightweight)
python update_agents_md.py --minimal
# Custom categories and limit
python update_agents_md.py --categories llm_local image_gen --limit 3
# Force refresh from sources first
python update_agents_md.py --refresh
Add to your cron or systemd timer for daily updates:
# ~: crontab -e
@daily python ~/Apps/sota-tracker-mcp/update_agents_md.py
Or systemd:
# ~/.config/systemd/user/sota-update.service
[Unit]
Description=Update SOTA models for agents
After=network.target
[Service]
ExecStart=%h/Apps/sota-tracker-mcp/update_agents_md.py
[Install]
WantedBy=default.target
# ~/.config/systemd/user/sota-update.timer
[Unit]
Description=Daily SOTA data update
OnCalendar=daily
AccuracySec=1h
[Install]
WantedBy=timers.target
# Enable
systemctl --user enable --now sota-update.timer
See CONTRIBUTING.md for full setup guide
This project aggregates publicly available benchmark data from third-party sources. We do not claim ownership of rankings, Elo scores, or benchmark results.
| Source | Data | Permission |
|--------|------|------------|
| LMArena | Chatbot Arena Elo rankings | robots.txt: Allow: / |
| Artificial Analysis | LLM quality benchmarks | robots.txt: Allow: / (explicitly allows AI crawlers) |
| HuggingFace | Model metadata, downloads | Public API |
| Open LLM Leaderboard | Open-source LLM benchmarks | CC-BY license |
This project:
MIT - See LICENSE for details.
The code in this repository is MIT licensed. The data belongs to its respective sources (see attribution above).
AI Usage Analysis
Analysis is being generated⦠refresh in a few seconds.
Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Gemini CLI for one-shot Q&A, summaries, and generation.
Research any topic from the last 30 days on Reddit + X + Web, synthesize findings, and write copy-paste-ready prompts. Use when the user wants recent social/web research on a topic, asks "what are people saying about X", or wants to learn current best practices. Requires OPENAI_API_KEY and/or XAI_API_KEY for full Reddit+X access, falls back to web search.
Check Antigravity account quotas for Claude and Gemini models. Shows remaining quota and reset times with ban detection.
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates opencla...
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates openclaw.json. Use when the user mentions free AI, OpenRouter, model switching, rate limits, or wants to reduce AI costs.