improvement-evaluator当需要验证 Skill 改进是否真正提升了 AI 执行效果时使用。通过预定义任务集(YAML)运行 AI 任务,判定 pass/fail,输出 execution_pass_rate。不用于文档结构评分(用 improvement-learner)或候选打分(用 improvement-discriminator)。
Install via ClawdBot CLI:
clawdbot install lanyasheng/improvement-evaluatorGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Accesses sensitive credential files or environment variables
/etc/passwdAI Analysis
The skill definition describes a legitimate testing framework for evaluating AI skill improvements through task execution. It focuses on running predefined YAML task suites, comparing pass rates, and caching results. No evidence of data exfiltration, credential harvesting, unauthorized API calls, or obfuscated malicious behavior is present in the provided content.
Audited Apr 16, 2026 · audit v1.0
Generated Apr 29, 2026
Integrate the evaluator into a continuous integration pipeline to automatically run a task suite against any proposed change to a SKILL.md file. Compare execution pass rates with baseline to detect regressions before merging, ensuring that AI task performance does not degrade.
Use the evaluator to validate improvements to an AI chatbot's instruction set. Define task suites that test core conversational behaviors; only deploy new skill versions that show a positive delta in execution pass rate, preventing poorly performing updates from reaching production.
Data scientists can leverage the evaluator to compare multiple candidate prompts or few-shot examples on a standardized task suite. The execution pass rate provides an objective metric to select the most effective approach for real-world task completion.
When writing a new task suite for a skill, use the evaluator's baseline run to ensure the tasks are not too easy or too hard. A sufficient baseline pass rate validates that the suite is a meaningful test of the skill's capabilities.
Combine the evaluator with an improvement learner and discriminator to form a full optimization loop. The evaluator provides the execution pass rate signal that gates whether a skill revision should be accepted, replacing subjective document structure scoring with empirical performance data.
Offer a free tier allowing a limited number of task suite runs per month, and premium plans for higher throughput and priority support. Revenue comes from subscription fees based on usage volume.
License the evaluator as part of a larger continuous improvement platform for AI skills, including orchestration, version control, and analytics. Revenue is generated through enterprise contracts and professional services for custom integrations.
Provide the evaluator as a cloud API that CI/CD systems call to validate skill updates. Charge per evaluation run, with tiered pricing for high-volume users. Additional revenue from caching and historical data access.
💬 Integration Tip
To integrate, wrap the evaluator in a CI step; cache baseline results for 7 days to reduce API costs. Ensure task suites are versioned and stored alongside skill definitions for reproducibility.
Scored Apr 19, 2026
Ad intelligence & app analytics assistant. Search ad creatives, analyze apps, view rankings, track downloads/revenue, and get market insights. Get your API k...
Build, debug, and deploy websites using HTML, CSS, JavaScript, and modern frameworks following production best practices.
Best practices for Remotion - Video creation in React
Perform structured HTTP/HTTPS requests (GET, POST, PUT, DELETE) with custom headers and JSON body support. Use for API testing, health checks, or interacting...
Best practices for Remotion - Video creation in React
Post to X (Twitter) using the official API with OAuth 1.0a. Use when you need to tweet, post updates, or publish content. Bypasses rate limits and bot detection that affect cookie-based approaches like bird CLI.