⚠️Install with caution. This skill has very few installs. Always review the source and verify it on clawhub.ai before installing. Community-built skills run with agent permissions — only install ones you trust.

🔄 Agent Self-Improvement

Improvement Evaluatorv1.0.0

Name: Improvement Evaluator
Author: lanyasheng

auto-improvement-evaluator

lanyasheng

当需要验证 Skill 改进是否真正提升了 AI 执行效果时使用。通过预定义任务集（YAML）运行 AI 任务，判定 pass/fail，输出 execution_pass_rate。不用于文档结构评分（用 improvement-learner）或候选打分（用 improvement-discriminator）。

Download Package View on ClawHub

Installs (all time)

Installs (current)

Downloads

Stars

CreatedApr 5, 2026

UpdatedApr 5, 2026

Install & Quick Start

Install via ClawdBot CLI:

clawdbot install lanyasheng/auto-improvement-evaluator

Skill Package16 files

📋SKILL.mdmarkdown

Failed to load file.

Quality Score

B54/100

Grade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.

Market Validation1/35

· No tracked installs (may still have manual users)
· 25 downloads (minimal demand)

Documentation20/25

· SKILL.md present
· Detailed documentation (≥3000 chars)
· Contains usage examples or trigger description
· Detailed summary

Package Completeness12/15

· skillAssets present (15 files)

Security Analysis

💙 Low Risk

CREDENTIAL_ACCESShigh

Accesses sensitive credential files or environment variables

/etc/passwd

AI Analysis

This skill is designed for local evaluation of AI skill improvements through task execution and does not contain instructions for data exfiltration or unauthorized API calls. The credential access signal appears to be a false positive from scanning for '/etc/passwd' references, but the actual skill functionality involves running predefined test suites against AI outputs, not accessing sensitive files. The skill operates in controlled evaluation contexts with clear boundaries.

Audited Apr 16, 2026 · audit v1.0

💡

Usage Guide

Generated Apr 17, 2026

AI Developers and EngineersQuality Assurance Teamsintermediate

💡 Application Scenarios

AI Skill Development PipelineArtificial Intelligence & Software Development

Used by AI development teams to validate that updates to a skill's documentation (SKILL.md) actually improve the AI's task execution performance. Teams run a predefined task suite in YAML format to compare candidate skill versions against a baseline, ensuring changes lead to measurable execution pass rate improvements before deployment.

Quality Assurance for AI-Powered ToolsTechnology & SaaS

Employed by QA engineers in companies building AI-driven tools (e.g., chatbots, code assistants) to systematically test skill enhancements. By executing task suites with judges like ContainsJudge or PytestJudge, they verify that AI outputs meet deterministic criteria, reducing regression risks and maintaining high execution standards.

Educational AI Content ValidationEdTech

Used by educational technology firms to assess AI skills designed for tutoring or content generation. Running standalone evaluations on current SKILL.md files helps identify baseline failures in tasks like explaining concepts or generating structured outputs, ensuring AI provides accurate and pedagogically sound responses.

Enterprise AI Integration TestingEnterprise IT & Consulting

Applied by large enterprises integrating custom AI skills into business workflows (e.g., report generation, data analysis). The evaluator's pipeline mode compares candidate skills against baselines using thresholds, allowing teams to gate deployments based on execution pass rates and avoid performance degradation in production environments.

💼 Business Models

SaaS Platform for AI Skill ManagementSubscription-based

Offers a cloud-based service where users can upload, test, and evaluate AI skills using the Improvement Evaluator. Revenue is generated through subscription tiers based on usage limits, such as number of task suite runs or advanced judge types like LLMRubricJudge, targeting AI developers and enterprises.

Consulting Services for AI OptimizationProject-based fees

Provides expert consulting to help companies implement and customize the Improvement Evaluator within their AI pipelines. Revenue comes from project-based fees for setting up task suites, integrating with existing systems, and ongoing support to ensure skill improvements translate to better AI performance.

Open-Source Tool with Premium FeaturesFreemium with paid upgrades

Distributes the Improvement Evaluator as open-source software under MIT license, with monetization through premium features like enhanced caching, advanced analytics dashboards, or priority support. Targets tech-savvy users who need scalable evaluation capabilities beyond the basic CLI tool.

💬 Integration Tip

Integrate the evaluator into CI/CD pipelines by running standalone mode with mock options for automated testing, ensuring skill changes are validated against task suites before merging.