senior-prompt-engineerThis skill should be used when the user asks to "optimize prompts", "design prompt templates", "evaluate LLM outputs", "build agentic systems", "implement RAG", "create few-shot examples", "analyze token usage", or "design AI workflows". Use for prompt engineering patterns, LLM evaluation frameworks, agent architectures, and structured output design.
Install via ClawdBot CLI:
clawdbot install alirezarezvani/senior-prompt-engineerPrompt engineering patterns, LLM evaluation frameworks, and agentic system design.
# Analyze and optimize a prompt file
python scripts/prompt_optimizer.py prompts/my_prompt.txt --analyze
# Evaluate RAG retrieval quality
python scripts/rag_evaluator.py --contexts contexts.json --questions questions.json
# Visualize agent workflow from definition
python scripts/agent_orchestrator.py agent_config.yaml --visualize
Analyzes prompts for token efficiency, clarity, and structure. Generates optimized versions.
Input: Prompt text file or string
Output: Analysis report with optimization suggestions
Usage:
# Analyze a prompt file
python scripts/prompt_optimizer.py prompt.txt --analyze
# Output:
# Token count: 847
# Estimated cost: $0.0025 (GPT-4)
# Clarity score: 72/100
# Issues found:
# - Ambiguous instruction at line 3
# - Missing output format specification
# - Redundant context (lines 12-15 repeat lines 5-8)
# Suggestions:
# 1. Add explicit output format: "Respond in JSON with keys: ..."
# 2. Remove redundant context to save 89 tokens
# 3. Clarify "analyze" -> "list the top 3 issues with severity ratings"
# Generate optimized version
python scripts/prompt_optimizer.py prompt.txt --optimize --output optimized.txt
# Count tokens for cost estimation
python scripts/prompt_optimizer.py prompt.txt --tokens --model gpt-4
# Extract and manage few-shot examples
python scripts/prompt_optimizer.py prompt.txt --extract-examples --output examples.json
Evaluates Retrieval-Augmented Generation quality by measuring context relevance and answer faithfulness.
Input: Retrieved contexts (JSON) and questions/answers
Output: Evaluation metrics and quality report
Usage:
# Evaluate retrieval quality
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
# Output:
# === RAG Evaluation Report ===
# Questions evaluated: 50
#
# Retrieval Metrics:
# Context Relevance: 0.78 (target: >0.80)
# Retrieval Precision@5: 0.72
# Coverage: 0.85
#
# Generation Metrics:
# Answer Faithfulness: 0.91
# Groundedness: 0.88
#
# Issues Found:
# - 8 questions had no relevant context in top-5
# - 3 answers contained information not in context
#
# Recommendations:
# 1. Improve chunking strategy for technical documents
# 2. Add metadata filtering for date-sensitive queries
# Evaluate with custom metrics
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json \
--metrics relevance,faithfulness,coverage
# Export detailed results
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json \
--output report.json --verbose
Parses agent definitions and visualizes execution flows. Validates tool configurations.
Input: Agent configuration (YAML/JSON)
Output: Workflow visualization, validation report
Usage:
# Validate agent configuration
python scripts/agent_orchestrator.py agent.yaml --validate
# Output:
# === Agent Validation Report ===
# Agent: research_assistant
# Pattern: ReAct
#
# Tools (4 registered):
# [OK] web_search - API key configured
# [OK] calculator - No config needed
# [WARN] file_reader - Missing allowed_paths
# [OK] summarizer - Prompt template valid
#
# Flow Analysis:
# Max depth: 5 iterations
# Estimated tokens/run: 2,400-4,800
# Potential infinite loop: No
#
# Recommendations:
# 1. Add allowed_paths to file_reader for security
# 2. Consider adding early exit condition for simple queries
# Visualize agent workflow (ASCII)
python scripts/agent_orchestrator.py agent.yaml --visualize
# Output:
# ┌─────────────────────────────────────────┐
# │ research_assistant │
# │ (ReAct Pattern) │
# └─────────────────┬───────────────────────┘
# │
# ┌────────▼────────┐
# │ User Query │
# └────────┬────────┘
# │
# ┌────────▼────────┐
# │ Think │◄──────┐
# └────────┬────────┘ │
# │ │
# ┌────────▼────────┐ │
# │ Select Tool │ │
# └────────┬────────┘ │
# │ │
# ┌─────────────┼─────────────┐ │
# ▼ ▼ ▼ │
# [web_search] [calculator] [file_reader]
# │ │ │ │
# └─────────────┼─────────────┘ │
# │ │
# ┌────────▼────────┐ │
# │ Observe │───────┘
# └────────┬────────┘
# │
# ┌────────▼────────┐
# │ Final Answer │
# └─────────────────┘
# Export workflow as Mermaid diagram
python scripts/agent_orchestrator.py agent.yaml --visualize --format mermaid
Use when improving an existing prompt's performance or reducing token costs.
Step 1: Baseline current prompt
python scripts/prompt_optimizer.py current_prompt.txt --analyze --output baseline.json
Step 2: Identify issues
Review the analysis report for:
Step 3: Apply optimization patterns
| Issue | Pattern to Apply |
|-------|------------------|
| Ambiguous output | Add explicit format specification |
| Too verbose | Extract to few-shot examples |
| Inconsistent results | Add role/persona framing |
| Missing edge cases | Add constraint boundaries |
Step 4: Generate optimized version
python scripts/prompt_optimizer.py current_prompt.txt --optimize --output optimized.txt
Step 5: Compare results
python scripts/prompt_optimizer.py optimized.txt --analyze --compare baseline.json
# Shows: token reduction, clarity improvement, issues resolved
Step 6: Validate with test cases
Run both prompts against your evaluation set and compare outputs.
Use when creating examples for in-context learning.
Step 1: Define the task clearly
Task: Extract product entities from customer reviews
Input: Review text
Output: JSON with {product_name, sentiment, features_mentioned}
Step 2: Select diverse examples (3-5 recommended)
| Example Type | Purpose |
|--------------|---------|
| Simple case | Shows basic pattern |
| Edge case | Handles ambiguity |
| Complex case | Multiple entities |
| Negative case | What NOT to extract |
Step 3: Format consistently
Example 1:
Input: "Love my new iPhone 15, the camera is amazing!"
Output: {"product_name": "iPhone 15", "sentiment": "positive", "features_mentioned": ["camera"]}
Example 2:
Input: "The laptop was okay but battery life is terrible."
Output: {"product_name": "laptop", "sentiment": "mixed", "features_mentioned": ["battery life"]}
Step 4: Validate example quality
python scripts/prompt_optimizer.py prompt_with_examples.txt --validate-examples
# Checks: consistency, coverage, format alignment
Step 5: Test with held-out cases
Ensure model generalizes beyond your examples.
Use when you need reliable JSON/XML/structured responses.
Step 1: Define schema
{
"type": "object",
"properties": {
"summary": {"type": "string", "maxLength": 200},
"sentiment": {"enum": ["positive", "negative", "neutral"]},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["summary", "sentiment"]
}
Step 2: Include schema in prompt
Respond with JSON matching this schema:
- summary (string, max 200 chars): Brief summary of the content
- sentiment (enum): One of "positive", "negative", "neutral"
- confidence (number 0-1): Your confidence in the sentiment
Step 3: Add format enforcement
IMPORTANT: Respond ONLY with valid JSON. No markdown, no explanation.
Start your response with { and end with }
Step 4: Validate outputs
python scripts/prompt_optimizer.py structured_prompt.txt --validate-schema schema.json
| File | Contains | Load when user asks about |
|------|----------|---------------------------|
| references/prompt_engineering_patterns.md | 10 prompt patterns with input/output examples | "which pattern?", "few-shot", "chain-of-thought", "role prompting" |
| references/llm_evaluation_frameworks.md | Evaluation metrics, scoring methods, A/B testing | "how to evaluate?", "measure quality", "compare prompts" |
| references/agentic_system_design.md | Agent architectures (ReAct, Plan-Execute, Tool Use) | "build agent", "tool calling", "multi-agent" |
| Pattern | When to Use | Example |
|---------|-------------|---------|
| Zero-shot | Simple, well-defined tasks | "Classify this email as spam or not spam" |
| Few-shot | Complex tasks, consistent format needed | Provide 3-5 examples before the task |
| Chain-of-Thought | Reasoning, math, multi-step logic | "Think step by step..." |
| Role Prompting | Expertise needed, specific perspective | "You are an expert tax accountant..." |
| Structured Output | Need parseable JSON/XML | Include schema + format enforcement |
# Prompt Analysis
python scripts/prompt_optimizer.py prompt.txt --analyze # Full analysis
python scripts/prompt_optimizer.py prompt.txt --tokens # Token count only
python scripts/prompt_optimizer.py prompt.txt --optimize # Generate optimized version
# RAG Evaluation
python scripts/rag_evaluator.py --contexts ctx.json --questions q.json # Evaluate
python scripts/rag_evaluator.py --contexts ctx.json --compare baseline # Compare to baseline
# Agent Development
python scripts/agent_orchestrator.py agent.yaml --validate # Validate config
python scripts/agent_orchestrator.py agent.yaml --visualize # Show workflow
python scripts/agent_orchestrator.py agent.yaml --estimate-cost # Token estimation
Generated Mar 1, 2026
A marketing agency uses the skill to optimize prompts for generating client-specific content, ensuring high-quality outputs while minimizing token costs. They implement structured output design for consistent branding across campaigns.
An online learning platform leverages the skill to design few-shot examples for tutoring AI agents, improving student interactions. They evaluate LLM outputs to maintain educational accuracy and adapt workflows for different subjects.
A healthcare analytics firm uses the skill to build agentic systems for processing medical reports, implementing RAG to retrieve relevant research. They optimize prompts for extracting structured data while ensuring compliance and reducing errors.
A fintech startup employs the skill to design prompt templates for generating personalized investment reports, analyzing token usage to control API costs. They create evaluation frameworks to verify financial advice accuracy and regulatory adherence.
An e-commerce company integrates the skill to optimize prompts for AI chatbots handling customer inquiries, using agent orchestrator to manage workflows. They implement RAG evaluator to ensure responses are based on up-to-date product information.
Offer the skill as a cloud-based service with tiered pricing based on usage volume, targeting enterprises needing ongoing prompt optimization and agent management. Revenue streams include monthly subscriptions and pay-per-use API calls.
Provide expert consulting to businesses for custom prompt engineering, agent architecture design, and RAG implementation projects. Revenue is generated through fixed project fees and retainer agreements for continuous support.
Develop and sell training courses and certifications on prompt engineering and AI agent development, leveraging the skill's tools for hands-on exercises. Revenue comes from course enrollments, certification exams, and corporate training packages.
💬 Integration Tip
Integrate the skill into existing AI pipelines by using its command-line tools for batch processing and exporting results in JSON format for easy automation.
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Search and analyze your own session logs (older/parent conversations) using jq.
Typed knowledge graph for structured agent memory and composable skills. Use when creating/querying entities (Person, Project, Task, Event, Document), linking related objects, enforcing constraints, planning multi-step actions as graph transformations, or when skills need to share state. Trigger on "remember", "what do I know about", "link X to Y", "show dependencies", entity CRUD, or cross-skill data access.
Ultimate AI agent memory system for Cursor, Claude, ChatGPT & Copilot. WAL protocol + vector search + git-notes + cloud backup. Never lose context again. Vibe-coding ready.
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection