⛔This skill has been removed from clawhub.ai. It is no longer available for installation and may not function correctly. The information shown here is preserved for reference only.

📈 Analytics & BI

Llm Evaluatorv1.0.0

Name: Llm Evaluator
Author: aiwithabidi

llm-evaluator

aiwithabidi

LLM-as-a-Judge evaluation system using Langfuse. Score AI outputs on relevance, accuracy, hallucination, and helpfulness. Backfill scoring on historical trac...

latest

UnavailableView on ClawHub

Installs (all time)

Installs (current)

Downloads

198

Stars

CreatedMar 5, 2026

UpdatedMay 10, 2026

Install & Quick Start

Install via ClawdBot CLI:

clawdbot install aiwithabidi/llm-evaluator

https://www.agxntsix.ai

Skill Package2 files

📋SKILL.mdmarkdown

Failed to load file.

Quality Score

B53/100

Grade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.

Market Validation2/35

· No tracked installs (may still have manual users)
· 198 downloads (low demand)

Documentation18/25

· SKILL.md present
· Moderate documentation (≥1500 chars)
· Contains usage examples or trigger description
· Detailed summary

Package Completeness8/15

· skillAssets present (1 files)

Security Analysis

💙 Low Risk

UNDOCUMENTED_EXTERNALlow

Calls external URL not in known-safe list

https://www.agxntsix.ai

Audited Apr 17, 2026 · audit v1.0

💡

Usage Guide

Generated Mar 21, 2026

AI developers and data scientistsQuality assurance teams in tech companiesBusiness analysts monitoring AI performanceintermediate

💡 Application Scenarios

Customer Support Quality AssuranceE-commerce and SaaS

Monitor AI chatbot responses in customer service to ensure relevance and accuracy, reducing misinformation and improving user satisfaction. Automatically score historical interactions to identify areas for model improvement.

Healthcare Information VerificationHealthcare Technology

Evaluate AI-generated medical advice or symptom analysis for factual correctness and hallucination detection, ensuring compliance with health regulations. Use batch scoring to audit recent outputs for safety.

Educational Content AssessmentEdTech

Score AI tutors or learning assistants on helpfulness and accuracy in educational responses, enhancing learning outcomes. Backfill evaluations on past traces to refine curriculum alignment.

Financial Report AnalysisFinance and Banking

Assess AI-generated financial summaries or market insights for relevance and accuracy, minimizing risks from erroneous data. Use specific evaluators like accuracy for critical fact-checking tasks.

Legal Document ReviewLegal Services

Evaluate AI outputs in legal research or contract analysis for hallucination and relevance, ensuring reliable information for case preparation. Batch score traces to maintain quality standards over time.

💼 Business Models

SaaS SubscriptionRecurring monthly fees from $99 to $999 per month

Offer the evaluator as a cloud-based service with tiered pricing based on usage volume, targeting businesses needing continuous AI output monitoring. Revenue streams include monthly subscriptions and pay-per-evaluation fees.

Consulting and IntegrationProject-based fees ranging from $5,000 to $50,000

Provide setup and customization services for integrating the evaluator into existing AI workflows, including training and support. Revenue comes from one-time project fees and ongoing maintenance contracts.

White-Label SolutionAnnual licensing fees starting at $10,000 plus usage-based royalties

License the evaluator technology to other AI platforms or enterprises for embedding into their products, with revenue from licensing fees and royalties. Targets companies seeking to enhance their own AI evaluation capabilities.

💬 Integration Tip

Ensure your Langfuse instance is properly configured and the OpenRouter API key is set in environment variables before running evaluation scripts.