skillbenchTrack skill versions, benchmark performance, compare improvements, and get self-improvement signals. Integrates with tasktime and ClawVault.
Install via ClawdBot CLI:
clawdbot install G9Pedro/skillbenchSelf-improving skill ecosystem for AI agents.
Track skill versions, benchmark performance, compare improvements, and get signals on what to fix next.
Part of the ClawVault ecosystem | tasktime | ClawHub
npm install -g @versatly/skillbench
1. Use a skill β skillbench use github@1.0.0
2. Do the task β tt start "Create PR" && ... && tt stop
3. Record result β skillbench record "Create PR" --success
4. Check scores β skillbench score github
5. Improve skill β Update skill, bump version
6. Repeat β Compare v1.0.0 vs v1.1.0
skillbench use github@1.2.0 # Set active skill version
skillbench skills # List tracked skills + signals
# Auto-pulls duration from tasktime
skillbench record "Create PR" --success
# Manual duration
skillbench record "Create PR" --duration 45s --success
# Record failures
skillbench record "Create PR" --fail --error-type "auth-error"
skillbench score # All skills with grades
skillbench score github # Single skill
skillbench compare github@1.0.0 github@1.1.0
skillbench export --format markdown
skillbench export --format json
skillbench dashboard # Generate HTML dashboard
skillbench dashboard --open # Generate and open in browser
skillbench test tasktime@1.1.0 # Run smoke test
skillbench test tasktime@1.1.0 --suite full # Run named suite
skillbench test tasktime@1.1.0 --dry-run # Test without recording
skillbench sync --clawhub # Import installed skills
skillbench sync --vault # Sync to ClawVault
skillbench sync --all # Everything
skillbench health # Overall health report with alerts
skillbench watch --once # Run all test suites once
skillbench watch --interval 300 # Continuous monitoring every 5 min
skillbench improve # Get suggestions for weakest skill
skillbench improve github # Improvement plan for specific skill
skillbench trend tasktime --days 30 # Performance trend over time
skillbench leaderboard # Compare agents (multi-agent setups)
skillbench schedule --interval 60 # Generate cron config for auto-testing
skillbench baseline tasktime --set # Set baseline from current performance
skillbench baseline --list # List all baselines
skillbench baseline --check # Check all baselines (CI-friendly, exits 1 if failing)
skillbench baseline tasktime --remove # Remove a baseline
skillbench ci # Run all tests + baseline checks
skillbench ci --json # JSON output for automation
skillbench badge # Generate shields.io badges for README
Copy examples/github-action.yml for ready-to-use GitHub Actions workflow.
| Grade | Score | Meaning |
|-------|-------|---------|
| π A+ | 95-100 | Elite performance |
| β A | 85-94 | Excellent |
| π B | 70-84 | Good |
| β οΈ C | 50-69 | Needs work |
| β D | <50 | Broken |
Based on: Success Rate (40%), Avg Duration (30%), Consistency (20%), Trend (10%)
When you omit --duration, skillbench pulls from tasktime:
tt start "Create PR" -c git
# ... do work ...
tt stop
skillbench record --success # Duration auto-pulled
Benchmarks sync to ClawVault automatically.
skillbench skills shows:
AI Usage Analysis
Analysis is being generated⦠refresh in a few seconds.
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Search and analyze your own session logs (older/parent conversations) using jq.
Typed knowledge graph for structured agent memory and composable skills. Use when creating/querying entities (Person, Project, Task, Event, Document), linking related objects, enforcing constraints, planning multi-step actions as graph transformations, or when skills need to share state. Trigger on "remember", "what do I know about", "link X to Y", "show dependencies", entity CRUD, or cross-skill data access.
Ultimate AI agent memory system for Cursor, Claude, ChatGPT & Copilot. WAL protocol + vector search + git-notes + cloud backup. Never lose context again. Vibe-coding ready.
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection