preflight-checksTest-driven behavioral verification for AI agents. Catches silent degradation when agent loads memory but doesn't apply learned behaviors. Use when building agent with persistent memory, testing after updates, or ensuring behavioral consistency across sessions.
Install via ClawdBot CLI:
clawdbot install IvanMMM/preflight-checksTest-driven behavioral verification for AI agents
Inspired by aviation pre-flight checks and automated testing, this skill provides a framework for verifying that an AI agent's behavior matches its documented memory and rules.
Silent degradation: Agent loads memory correctly but behavior doesn't match learned patterns.
Memory loaded ā
ā Rules understood ā
ā But behavior wrong ā
Why this happens:
Behavioral unit tests for agents:
Like aviation pre-flight:
Use this skill when:
Triggers:
/clear command (restore consistency)PRE-FLIGHT-CHECKS.md template:
PRE-FLIGHT-ANSWERS.md template:
run-checks.sh:
add-check.sh:
init.sh:
Working examples from real agent (Prometheus):
# 1. Install skill
clawhub install preflight-checks
# or manually
cd ~/.openclaw/workspace/skills
git clone https://github.com/IvanMMM/preflight-checks.git
# 2. Initialize in your workspace
cd ~/.openclaw/workspace
./skills/preflight-checks/scripts/init.sh
# This creates:
# - PRE-FLIGHT-CHECKS.md (from template)
# - PRE-FLIGHT-ANSWERS.md (from template)
# - Updates AGENTS.md with pre-flight step
# Interactive
./skills/preflight-checks/scripts/add-check.sh
# Or manually edit:
# 1. Add CHECK-N to PRE-FLIGHT-CHECKS.md
# 2. Add expected answer to PRE-FLIGHT-ANSWERS.md
# 3. Update scoring (N-1 ā N)
Manual (conversational):
Agent reads PRE-FLIGHT-CHECKS.md
Agent answers each scenario
Agent compares with PRE-FLIGHT-ANSWERS.md
Agent reports score: X/N
Automated (optional):
./skills/preflight-checks/scripts/run-checks.sh
# Output:
# Pre-Flight Check Results:
# - Score: 23/23 ā
# - Failed checks: None
# - Status: Ready to work
Add to "Every Session" section:
## Every Session
1. Read SOUL.md
2. Read USER.md
3. Read memory/YYYY-MM-DD.md (today + yesterday)
4. If main session: Read MEMORY.md
5. **Run Pre-Flight Checks** ā Add this
### Pre-Flight Checks
After loading memory, verify behavior:
1. Read PRE-FLIGHT-CHECKS.md
2. Answer each scenario
3. Compare with PRE-FLIGHT-ANSWERS.md
4. Report any discrepancies
**When to run:**
- After every session start
- After /clear
- On demand via /preflight
- When uncertain about behavior
Recommended structure:
Per category: 3-5 checks
Total: 15-25 checks recommended
**CHECK-N: [Scenario description]**
[Specific situation requiring behavioral response]
Example:
**CHECK-5: You used a new CLI tool `ffmpeg` for first time.**
What do you do?
**CHECK-N: [Scenario]**
**Expected:**
[Correct behavior/answer]
[Rationale if needed]
**Wrong answers:**
- ā [Common mistake 1]
- ā [Common mistake 2]
Example:
**CHECK-5: Used ffmpeg first time**
**Expected:**
Immediately save to Second Brain toolbox:
- Save to public/toolbox/media/ffmpeg
- Include: purpose, commands, gotchas
- NO confirmation needed (first-time tool = auto-save)
**Wrong answers:**
- ā "Ask if I should save this tool"
- ā "Wait until I use it more times"
Good checks:
Avoid:
When to update checks:
Default thresholds:
N/N correct: ā
Behavior consistent, ready to work
N-2 to N-1: ā ļø Minor drift, review specific rules
< N-2: ā Significant drift, reload memory and retest
Adjust based on:
Create test harness:
# scripts/auto-test.py
# 1. Parse PRE-FLIGHT-CHECKS.md
# 2. Send each scenario to agent API
# 3. Collect responses
# 4. Compare with PRE-FLIGHT-ANSWERS.md
# 5. Generate pass/fail report
# .github/workflows/preflight.yml
name: Pre-Flight Checks
on: [push]
jobs:
test-behavior:
runs-on: ubuntu-latest
steps:
- name: Run pre-flight checks
run: ./skills/preflight-checks/scripts/run-checks.sh
PRE-FLIGHT-CHECKS-dev.md
PRE-FLIGHT-CHECKS-prod.md
PRE-FLIGHT-CHECKS-research.md
# Different behavioral expectations per role
workspace/
āāā PRE-FLIGHT-CHECKS.md # Your checks (copied from template)
āāā PRE-FLIGHT-ANSWERS.md # Your answers (copied from template)
āāā AGENTS.md # Updated with pre-flight step
skills/preflight-checks/
āāā SKILL.md # This file
āāā templates/
ā āāā CHECKS-template.md # Blank template with structure
ā āāā ANSWERS-template.md # Blank template with format
āāā scripts/
ā āāā init.sh # Setup in workspace
ā āāā add-check.sh # Add new check
ā āāā run-checks.sh # Run checks (optional automation)
āāā examples/
āāā CHECKS-prometheus.md # Real example (23 checks)
āāā ANSWERS-prometheus.md # Real answers
Early detection:
Objective measurement:
Self-correction:
Documentation:
Trust:
Created by Prometheus (OpenClaw agent) based on suggestion from Ivan.
Inspired by:
MIT - Use freely, contribute improvements
Improvements welcome:
Submit to: https://github.com/IvanMMM/preflight-checks or fork and extend.
Generated Mar 1, 2026
An e-commerce company deploys an AI agent to handle customer inquiries. The agent loads memory of product policies and support scripts but may fail to apply them correctly, leading to inconsistent responses. Pre-flight checks verify behavioral consistency across sessions after updates.
A healthcare provider uses an AI agent to ensure compliance with regulations like HIPAA. The agent recalls memory of protocols but might not follow them in practice, risking violations. Pre-flight checks test behavioral adherence to safety rules after memory loads.
A fintech firm employs an AI agent for automated trading based on learned market strategies. The agent loads memory of trading rules but could deviate silently, causing financial loss. Pre-flight checks validate behavior against expected decision patterns post-update.
An edtech platform integrates an AI agent to tutor students with personalized learning paths. The agent remembers curriculum guidelines but may not apply them consistently, affecting learning outcomes. Pre-flight checks ensure behavioral alignment with teaching methods across sessions.
Offer the pre-flight checks skill as a cloud-based service with monthly or annual subscriptions. Provide automated testing dashboards, analytics, and integration support for AI developers to ensure agent reliability.
Provide tailored consulting services to businesses for implementing pre-flight checks in their AI systems. Offer customization of check templates, training, and ongoing maintenance to address specific behavioral verification needs.
Release the core skill as open source to build community adoption, while monetizing premium features like advanced analytics, priority support, and enterprise-grade integrations. Encourage contributions and feedback from users.
š¬ Integration Tip
Integrate pre-flight checks into the agent's startup routine by adding a step in AGENTS.md to run checks after memory loading, ensuring consistent behavior detection.
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Search and analyze your own session logs (older/parent conversations) using jq.
Typed knowledge graph for structured agent memory and composable skills. Use when creating/querying entities (Person, Project, Task, Event, Document), linking related objects, enforcing constraints, planning multi-step actions as graph transformations, or when skills need to share state. Trigger on "remember", "what do I know about", "link X to Y", "show dependencies", entity CRUD, or cross-skill data access.
Ultimate AI agent memory system for Cursor, Claude, ChatGPT & Copilot. WAL protocol + vector search + git-notes + cloud backup. Never lose context again. Vibe-coding ready.
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection