eval-skillsAI Agent Skill unit testing framework. A framework-agnostic toolkit for discovering, scaffolding, selecting, evaluating, and reporting on AI skills. Use this...
Install via ClawdBot CLI:
clawdbot install islinxu/eval-skillsGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Accesses sensitive credential files or environment variables
/etc/passwdPotentially destructive shell commands in tool definitions
rm -rf /Accesses system directories or attempts privilege escalation
/proc/Calls external URL not in known-safe list
https://github.com/you/my-skillGenerated Mar 21, 2026
A software development team uses eval-skills to run unit tests on new AI skills before integrating them into production agents. This ensures each skill meets predefined quality gates, such as a minimum completion rate, preventing faulty components from degrading overall agent performance and reducing debugging time.
A customer service company evaluates multiple candidate skills, like sentiment analysis or FAQ retrieval, using eval-skills to rank them on a benchmark of real customer queries. This helps select the most reliable skills for their chatbot, improving response accuracy and user satisfaction while minimizing operational costs.
An AI platform incorporates eval-skills into its continuous integration pipeline to automatically test skill upgrades. If a regression is detected, such as a drop in completion rate below a threshold, the pipeline blocks the merge, ensuring only high-quality updates are deployed and maintaining system stability.
An edtech company uses eval-skills to compare different tutoring skills, such as math problem solvers or language translators, on standardized educational benchmarks. This allows them to choose the best-performing skills for their learning platform, enhancing educational outcomes and scalability.
Researchers in academia or industry use eval-skills to quickly generate skill skeletons from templates, such as for HTTP requests or Python scripts, and then evaluate them against custom benchmarks. This accelerates prototyping and validation of AI capabilities in experimental settings.
Offer eval-skills as a cloud-based service where users can upload skills, run evaluations on hosted benchmarks, and access detailed reports via a web dashboard. Revenue is generated through subscription tiers based on usage volume, such as number of evaluations per month or advanced analytics features.
Provide consulting services to large organizations for integrating eval-skills into their AI development workflows, including custom benchmark creation, CI/CD setup, and training. Revenue comes from project-based fees and ongoing support contracts, targeting industries like finance or healthcare with strict quality requirements.
Operate a marketplace where developers can list their AI skills along with eval-skills-generated reports, showcasing performance metrics. Revenue is earned via commission on sales or listing fees, helping buyers make informed decisions and promoting high-quality skill adoption.
💬 Integration Tip
Start by integrating eval-skills into a CI/CD pipeline with a simple benchmark to automate skill testing; use the --exit-on-fail flag to enforce quality gates and prevent regressions in production deployments.
Scored Apr 19, 2026
Uses known external API (expected, informational)
api.openai.comAI Analysis
The skill appears to be a legitimate testing/evaluation framework for AI skills with no evidence of malicious data exfiltration or credential harvesting. The detected signals are likely from example/test code within the framework rather than active malicious behavior. The external API usage (api.openai.com) is consistent with AI skill evaluation purposes.
Audited Apr 17, 2026 · audit v1.0
Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.
Systematic code review patterns covering security, performance, maintainability, correctness, and testing — with severity levels, structured feedback guidance, review process, and anti-patterns to avoid. Use when reviewing PRs, establishing review standards, or improving review quality.
Coding style memory that adapts to your preferences, conventions, and patterns for consistent coding.
Provides a 7-step debugging protocol plus language-specific commands to systematically identify, verify, and fix software bugs across multiple environments.
Control and operate Opencode via slash commands. Use this skill to manage sessions, select models, switch agents (plan/build), and coordinate coding through Opencode.
Use when starting any conversation - establishes how to find and use skills, requiring Skill tool invocation before ANY response including clarifying questions