afrexai-qa-engineComprehensive QA system for planning strategy, writing tests, analyzing coverage, automating pipelines, performance and security testing, defect triage, and...
Install via ClawdBot CLI:
clawdbot install 1kalin/afrexai-qa-engineComplete quality assurance system β from test strategy to automation frameworks, coverage analysis, and release readiness. Works for any stack, any team size.
Before writing any tests, define the strategy:
# test-strategy.yaml
project: "[name]"
scope: "[feature/module/full product]"
risk_level: high | medium | low
stack:
language: "[TypeScript/Python/Java/Go]"
framework: "[React/Express/Django/Spring]"
test_runner: "[Jest/Vitest/pytest/JUnit/Go test]"
e2e_tool: "[Playwright/Cypress/Selenium]"
# What are we testing?
test_scope:
- area: "[e.g., Auth module]"
risk: high
test_types: [unit, integration, e2e]
priority: 1
- area: "[e.g., Settings page]"
risk: low
test_types: [unit]
priority: 3
# What's NOT in scope (and why)
exclusions:
- "[e.g., Third-party widget β covered by vendor]"
# Quality targets
targets:
line_coverage: 80
branch_coverage: 70
critical_path_coverage: 100
max_flaky_rate: 2%
max_test_duration_unit: 10ms
max_test_duration_integration: 500ms
max_test_duration_e2e: 30s
Not everything needs the same testing depth. Use the risk matrix:
| Risk Level | Unit Tests | Integration | E2E | Manual/Exploratory |
|-----------|-----------|-------------|-----|-------------------|
| Critical (payments, auth, data loss) | 95%+ coverage | Full API coverage | Happy + error paths | Exploratory session |
| High (core features, user-facing) | 85%+ coverage | Key integrations | Happy path | Spot check |
| Medium (secondary features) | 70%+ coverage | Critical paths only | Smoke only | On release |
| Low (admin, internal tools) | 50%+ coverage | None | None | None |
Follow the pyramid β not the ice cream cone:
/ E2E \ β Few (5-10%) β slow, expensive, brittle
/ Integr. \ β Some (15-25%) β API contracts, DB queries
/ Unit \ β Many (65-80%) β fast, isolated, cheap
Anti-pattern: Ice cream cone (mostly E2E, few unit tests) = slow CI, flaky builds, expensive maintenance.
Decision rule: Can this be tested at a lower level? β Test it there.
Every unit test follows AAA (Arrange-Act-Assert):
1. ARRANGE β Set up test data, mocks, state
2. ACT β Call the function/method under test
3. ASSERT β Verify the output matches expectations
For each function/method, verify:
Mock these:
DO NOT mock these:
Mock rule of thumb: If removing the mock would make the test hit the network, file system, or database β mock it. Otherwise β don't.
Use the pattern: [unit] [scenario] [expected result]
Examples:
calculateTotal returns 0 for empty cartvalidateEmail throws for missing @ symbolparseDate handles ISO 8601 with timezone offsetMetrics that matter:
| Metric | Target | Why |
|--------|--------|-----|
| Line coverage | 80%+ | Basic completeness |
| Branch coverage | 70%+ | Catches missed if/else paths |
| Function coverage | 90%+ | Ensures all functions are tested |
| Critical path coverage | 100% | Business-critical code fully verified |
Coverage traps to avoid:
Integration tests verify that components work TOGETHER:
Pattern 1: API Contract Testing
1. Start test server (or use supertest/httptest)
2. Send HTTP request with specific payload
3. Assert: status code, response body shape, headers
4. Assert: database state changed correctly
5. Assert: side effects triggered (emails, events)
Pattern 2: Database Integration
1. Start test database (SQLite in-memory or test container)
2. Run migrations
3. Seed test data
4. Execute query/operation
5. Assert: data matches expectations
6. Teardown (truncate or rollback transaction)
Pattern 3: External Service
1. Record real API response (VCR/nock/wiremock)
2. Replay recorded response in tests
3. Assert: your code handles the response correctly
4. Also test: timeout, 500 error, malformed response
E2E tests verify complete user journeys. They're expensive β be strategic:
Test these E2E:
DON'T test these E2E:
test_name: "[User journey name]"
preconditions:
- "[User is logged in]"
- "[Product exists in catalog]"
steps:
- action: "Navigate to /products"
verify: "Product list is visible"
- action: "Click 'Add to Cart' on Product A"
verify: "Cart badge shows 1"
- action: "Click 'Checkout'"
verify: "Checkout form displayed"
- action: "Fill payment details and submit"
verify: "Order confirmation page with order ID"
postconditions:
- "Order exists in database with status 'paid'"
- "Confirmation email sent"
max_duration: 30s
Flaky tests are the #1 CI killer. Handle them:
Flaky Test Triage:
Flaky rate target: < 2% of total test runs
| Type | Purpose | When |
|------|---------|------|
| Load test | Normal traffic handling | Before every release |
| Stress test | Find breaking point | Quarterly or before scaling |
| Spike test | Sudden traffic burst | Before marketing campaigns |
| Soak test | Memory leaks over time | Monthly or after major changes |
| Capacity test | Max users/throughput | Planning infrastructure |
test_name: "[API/Page] Load Test"
target: "[URL or endpoint]"
baseline:
p50_response: "[current p50 ms]"
p95_response: "[current p95 ms]"
p99_response: "[current p99 ms]"
error_rate: "[current %]"
scenarios:
- name: "Normal load"
vus: 50 # virtual users
duration: 5m
ramp_up: 30s
thresholds:
p95_response: "< 500ms"
error_rate: "< 1%"
- name: "Peak load"
vus: 200
duration: 10m
ramp_up: 1m
thresholds:
p95_response: "< 2000ms"
error_rate: "< 5%"
- name: "Stress test"
vus: 500
duration: 5m
ramp_up: 2m
# Find the breaking point β no thresholds, observe
Track these per endpoint:
| Metric | Green | Yellow | Red |
|--------|-------|--------|-----|
| p50 response | < 200ms | 200-500ms | > 500ms |
| p95 response | < 500ms | 500ms-2s | > 2s |
| p99 response | < 1s | 1-5s | > 5s |
| Error rate | < 0.1% | 0.1-1% | > 1% |
| Throughput | > baseline | 80-100% baseline | < 80% |
| CPU usage | < 60% | 60-80% | > 80% |
| Memory usage | < 70% | 70-85% | > 85% |
| DB query time | < 50ms avg | 50-200ms | > 200ms |
| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| Slow API response | N+1 queries | Batch/join queries |
| Memory climbing | Object retention | Profile heap, fix leaks |
| Timeout spikes | Connection pool exhaustion | Increase pool, add queuing |
| Slow page load | Large bundle | Code split, lazy load |
| DB bottleneck | Missing index | Add index on WHERE/JOIN columns |
| High CPU | Synchronous compute | Move to worker/queue |
Run through these for every feature/release:
Authentication & Authorization:
Input Validation:
../Data Protection:
*)Infrastructure:
| # | Vulnerability | Test For |
|---|--------------|----------|
| A01 | Broken Access Control | Access other users' resources, bypass role checks |
| A02 | Cryptographic Failures | Weak hashing, plaintext secrets, expired certs |
| A03 | Injection | SQL, XSS, command, LDAP injection |
| A04 | Insecure Design | Business logic flaws, missing rate limits |
| A05 | Security Misconfiguration | Default creds, verbose errors, open ports |
| A06 | Vulnerable Components | Outdated deps with known CVEs |
| A07 | Authentication Failures | Brute force, weak passwords, session fixation |
| A08 | Data Integrity Failures | Unsigned updates, CI/CD pipeline injection |
| A09 | Logging Failures | Missing audit logs, no alerting on breaches |
| A10 | SSRF | Internal network access via user-controlled URLs |
bug_id: "[auto or manual]"
title: "[Short description of the bug]"
severity: P0-critical | P1-high | P2-medium | P3-low
reporter: "[name]"
date: "[YYYY-MM-DD]"
environment:
os: "[OS + version]"
browser: "[Browser + version]"
app_version: "[version/commit]"
steps_to_reproduce:
1. "[Step 1]"
2. "[Step 2]"
3. "[Step 3]"
expected_result: "[What should happen]"
actual_result: "[What actually happens]"
frequency: "always | intermittent | once"
screenshots: "[links]"
logs: "[relevant log output]"
| Level | Definition | SLA | Example |
|-------|-----------|-----|---------|
| P0 Critical | System down, data loss, security breach | Fix in 4 hours | Payment processing broken |
| P1 High | Major feature broken, no workaround | Fix in 24 hours | Users can't login |
| P2 Medium | Feature broken with workaround | Fix this sprint | Search returns wrong results sometimes |
| P3 Low | Minor issue, cosmetic | Fix when convenient | Button alignment off by 2px |
1. Review all new bugs (unassigned)
2. For each bug:
a. Reproduce β can you trigger it?
b. Classify severity (P0-P3)
c. Estimate fix effort (S/M/L)
d. Assign to owner + sprint
e. Link to related bugs/stories
3. Review P0/P1 bugs from last week β are they fixed?
4. Close bugs that can't be reproduced (after 2 attempts)
5. Update metrics dashboard
Track weekly:
| Metric | Formula | Target |
|--------|---------|--------|
| Bug escape rate | Bugs found in prod / total bugs | < 10% |
| Mean time to fix (P0) | Avg hours from report to deploy | < 8 hours |
| Mean time to fix (P1) | Avg hours from report to deploy | < 48 hours |
| Bug reopen rate | Reopened bugs / closed bugs | < 5% |
| Test escape analysis | Bugs that SHOULD have been caught | Track & reduce |
| Open bug count | Total open by severity | Trending down |
Before shipping to production:
Code Quality:
Coverage & Quality Gates:
Performance:
Security:
Operational Readiness:
Score 0-100 across 5 dimensions:
| Dimension | Weight | Scoring |
|-----------|--------|---------|
| Test coverage | 25% | 100 if targets met, -10 per gap area |
| Bug status | 25% | 100 if 0 P0/P1, -20 per open P0, -10 per P1 |
| Performance | 20% | 100 if all green, -15 per yellow, -30 per red |
| Security | 20% | 100 if clean, -25 per critical, -15 per high |
| Operational | 10% | 100 if checklist complete, -20 per missing item |
Ship threshold: β₯ 80 overall, no dimension below 60
Configure these gates in your CI pipeline:
# Quality gate configuration
gates:
- name: "Lint"
stage: pre-commit
command: "npm run lint"
blocking: true
- name: "Unit Tests"
stage: commit
command: "npm test -- --coverage"
blocking: true
thresholds:
pass_rate: 100%
coverage_line: 80%
coverage_branch: 70%
- name: "Integration Tests"
stage: merge
command: "npm run test:integration"
blocking: true
thresholds:
pass_rate: 100%
- name: "Security Scan"
stage: merge
command: "npm audit --audit-level=high"
blocking: true
- name: "E2E Smoke"
stage: staging
command: "npm run test:e2e:smoke"
blocking: true
thresholds:
pass_rate: 100%
- name: "Performance"
stage: staging
command: "npm run test:perf"
blocking: false # Alert only
thresholds:
p95_regression: 20%
Rate your team 1-5:
| Level | Description | Characteristics |
|-------|------------|-----------------|
| 1 β Manual | All testing is manual | No automation, long release cycles |
| 2 β Reactive | Some unit tests, no CI | Tests written after bugs, not before |
| 3 β Structured | Test pyramid, CI pipeline | Unit + integration, automated on push |
| 4 β Proactive | Full automation, quality gates | E2E + perf + security in pipeline, TDD |
| 5 β Optimized | Self-healing, predictive | Flaky auto-quarantine, AI-assisted testing, continuous deployment |
review_date: "[YYYY-MM-DD]"
metrics:
total_tests: 0
pass_rate_7d: "0%"
flaky_tests: 0
flaky_rate: "0%"
avg_suite_duration: "0s"
coverage_line: "0%"
coverage_branch: "0%"
actions:
quarantined: [] # Tests moved to flaky suite
deleted: [] # Tests removed (obsolete/unfixable)
fixed: [] # Flaky tests fixed this week
added: [] # New tests added
trends:
coverage_delta: "+0%" # vs last week
flaky_delta: "+0" # vs last week
duration_delta: "+0s" # vs last week
notes: ""
| Anti-Pattern | Problem | Fix |
|-------------|---------|-----|
| Sleeping tests | sleep(2000) instead of waiting | Use explicit waits/polling |
| Test interdependence | Test B relies on Test A's state | Isolate β each test sets up its own state |
| Assertionless tests | Test runs code but doesn't assert | Add meaningful assertions |
| Brittle selectors | CSS selectors that break on redesign | Use data-testid or aria roles |
| God test | One test verifying 20 things | Split into focused tests |
| Mock overload | Everything mocked, nothing real tested | Only mock external boundaries |
| Hardcoded data | Tests break when seed data changes | Use factories/builders |
| Ignoring test output | "It passed, ship it" | Review WHY it passed β is the assertion meaningful? |
Tell the agent:
AI Usage Analysis
Analysis is being generated⦠refresh in a few seconds.
Automatically update Clawdbot and all installed skills once daily. Runs via cron, checks for updates, applies them, and messages the user with a summary of what changed.
Full desktop computer use for headless Linux servers. Xvfb + XFCE virtual desktop with xdotool automation. 17 actions (click, type, scroll, screenshot, drag,...
Essential Docker commands and workflows for container management, image operations, and debugging.
Tool discovery and shell one-liner reference for sysadmin, DevOps, and security tasks. AUTO-CONSULT this skill when the user is: troubleshooting network issues, debugging processes, analyzing logs, working with SSL/TLS, managing DNS, testing HTTP endpoints, auditing security, working with containers, writing shell scripts, or asks 'what tool should I use for X'. Source: github.com/trimstray/the-book-of-secret-knowledge
Deploy applications and manage projects with complete CLI reference. Commands for deployments, projects, domains, environment variables, and live documentation access.
Monitor topics of interest and proactively alert when important developments occur. Use when user wants automated monitoring of specific subjects (e.g., product releases, price changes, news topics, technology updates). Supports scheduled web searches, AI-powered importance scoring, smart alerts vs weekly digests, and memory-aware contextual summaries.