⚠️Install with caution. This skill has very few installs. Always review the source and verify it on clawhub.ai before installing. Community-built skills run with agent permissions — only install ones you trust.

📐 API & Code Quality

Improvement Evaluatorv1.1.1

Name: Improvement Evaluator
Author: lanyasheng

improvement-evaluator

lanyasheng

当需要验证 Skill 改进是否真正提升了 AI 执行效果时使用。通过预定义任务集（YAML）运行 AI 任务，判定 pass/fail，输出 execution_pass_rate。不用于文档结构评分（用 improvement-learner）或候选打分（用 improvement-discriminator）。

Download Package View on ClawHub

Installs (all time)

Installs (current)

Downloads

107

Stars

CreatedApr 6, 2026

UpdatedApr 7, 2026

Install & Quick Start

Install via ClawdBot CLI:

clawdbot install lanyasheng/improvement-evaluator

Skill Package22 files

📋SKILL.mdmarkdown

Failed to load file.

Quality Score

B54/100

Grade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.

Market Validation4/35

· 1 installs (minimal)
· 32 downloads (minimal demand)

Documentation20/25

· SKILL.md present
· Detailed documentation (≥3000 chars)
· Contains usage examples or trigger description
· Detailed summary

Package Completeness15/15

· skillAssets present (21 files)

Security Analysis

🔴 High Risk

CREDENTIAL_ACCESShigh

Accesses sensitive credential files or environment variables

/etc/passwd

AI Analysis

The skill definition describes a legitimate testing framework for evaluating AI skill improvements through task execution. It focuses on running predefined YAML task suites, comparing pass rates, and caching results. No evidence of data exfiltration, credential harvesting, unauthorized API calls, or obfuscated malicious behavior is present in the provided content.

Audited Apr 16, 2026 · audit v1.0

💡

Usage Guide

Generated Apr 29, 2026

AI/ML Engineers developing and maintaining AI agent skillsDevOps Engineers integrating quality gates into deployment pipelinesProduct Managers ensuring AI feature reliability before releaseintermediate

💡 Application Scenarios

AI Skill Regression Detection in CI/CD PipelineSoftware Development

Integrate the evaluator into a continuous integration pipeline to automatically run a task suite against any proposed change to a SKILL.md file. Compare execution pass rates with baseline to detect regressions before merging, ensuring that AI task performance does not degrade.

Content Quality Gating for LLM-Powered ChatbotsCustomer Service

Use the evaluator to validate improvements to an AI chatbot's instruction set. Define task suites that test core conversational behaviors; only deploy new skill versions that show a positive delta in execution pass rate, preventing poorly performing updates from reaching production.

Benchmarking New Prompt Engineering TechniquesAI Research

Data scientists can leverage the evaluator to compare multiple candidate prompts or few-shot examples on a standardized task suite. The execution pass rate provides an objective metric to select the most effective approach for real-world task completion.

Task Suite Sanity Check During AuthoringEducation

When writing a new task suite for a skill, use the evaluator's baseline run to ensure the tasks are not too easy or too hard. A sufficient baseline pass rate validates that the suite is a meaningful test of the skill's capabilities.

Automated Skill Improvement WorkflowTechnology

Combine the evaluator with an improvement learner and discriminator to form a full optimization loop. The evaluator provides the execution pass rate signal that gates whether a skill revision should be accepted, replacing subjective document structure scoring with empirical performance data.

💼 Business Models

Freemium with Evaluation LimitsMonthly and annual subscriptions, pay-per-evaluation for overage

Offer a free tier allowing a limited number of task suite runs per month, and premium plans for higher throughput and priority support. Revenue comes from subscription fees based on usage volume.

Enterprise Integration PlatformAnnual licensing fees and consulting services

License the evaluator as part of a larger continuous improvement platform for AI skills, including orchestration, version control, and analytics. Revenue is generated through enterprise contracts and professional services for custom integrations.

SaaS API for Automated Quality GatesPer-API-call pricing, monthly subscription tiers

Provide the evaluator as a cloud API that CI/CD systems call to validate skill updates. Charge per evaluation run, with tiered pricing for high-volume users. Additional revenue from caching and historical data access.

💬 Integration Tip

To integrate, wrap the evaluator in a CI step; cache baseline results for 7 days to reduce API costs. Ensure task suites are versioned and stored alongside skill definitions for reproducibility.