auto-improvement-evaluator当需要验证 Skill 改进是否真正提升了 AI 执行效果时使用。通过预定义任务集(YAML)运行 AI 任务,判定 pass/fail,输出 execution_pass_rate。不用于文档结构评分(用 improvement-learner)或候选打分(用 improvement-discriminator)。
Install via ClawdBot CLI:
clawdbot install lanyasheng/auto-improvement-evaluatorGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Accesses sensitive credential files or environment variables
/etc/passwdAI Analysis
This skill is designed for local evaluation of AI skill improvements through task execution and does not contain instructions for data exfiltration or unauthorized API calls. The credential access signal appears to be a false positive from scanning for '/etc/passwd' references, but the actual skill functionality involves running predefined test suites against AI outputs, not accessing sensitive files. The skill operates in controlled evaluation contexts with clear boundaries.
Audited Apr 16, 2026 · audit v1.0
Generated Apr 17, 2026
Used by AI development teams to validate that updates to a skill's documentation (SKILL.md) actually improve the AI's task execution performance. Teams run a predefined task suite in YAML format to compare candidate skill versions against a baseline, ensuring changes lead to measurable execution pass rate improvements before deployment.
Employed by QA engineers in companies building AI-driven tools (e.g., chatbots, code assistants) to systematically test skill enhancements. By executing task suites with judges like ContainsJudge or PytestJudge, they verify that AI outputs meet deterministic criteria, reducing regression risks and maintaining high execution standards.
Used by educational technology firms to assess AI skills designed for tutoring or content generation. Running standalone evaluations on current SKILL.md files helps identify baseline failures in tasks like explaining concepts or generating structured outputs, ensuring AI provides accurate and pedagogically sound responses.
Applied by large enterprises integrating custom AI skills into business workflows (e.g., report generation, data analysis). The evaluator's pipeline mode compares candidate skills against baselines using thresholds, allowing teams to gate deployments based on execution pass rates and avoid performance degradation in production environments.
Offers a cloud-based service where users can upload, test, and evaluate AI skills using the Improvement Evaluator. Revenue is generated through subscription tiers based on usage limits, such as number of task suite runs or advanced judge types like LLMRubricJudge, targeting AI developers and enterprises.
Provides expert consulting to help companies implement and customize the Improvement Evaluator within their AI pipelines. Revenue comes from project-based fees for setting up task suites, integrating with existing systems, and ongoing support to ensure skill improvements translate to better AI performance.
Distributes the Improvement Evaluator as open-source software under MIT license, with monetization through premium features like enhanced caching, advanced analytics dashboards, or priority support. Targets tech-savvy users who need scalable evaluation capabilities beyond the basic CLI tool.
💬 Integration Tip
Integrate the evaluator into CI/CD pipelines by running standalone mode with mock options for automated testing, ensuring skill changes are validated against task suites before merging.
Scored Apr 19, 2026
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Self-reflection + Self-criticism + Self-learning + Self-organizing memory. Agent evaluates its own work, catches mistakes, and improves permanently. Use when...
A self-evolution engine for AI agents. Analyzes runtime history to identify improvements and applies protocol-constrained evolution. Communicates with EvoMap...
AI自我改进与记忆系统 - 解决'同类错误反复犯、用户纠正不长记性'的痛点。自动捕获错误、用户纠正、最佳实践,并转化为长期记忆。
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Self-improving agent system that analyzes conversation quality, identifies improvement opportunities, and continuously optimizes response strategies.