Best OpenClaw Skills for Code Review & Testing: Debug, Test, Refactor with AI
Writing code is the easy part. The hard part is knowing it works — and keeping it working as requirements change. OpenClaw's code quality ecosystem has grown significantly, covering the full quality lifecycle from debugging a mysterious runtime error to enforcing TDD discipline across a project. This guide covers skills across five stages of the code quality workflow.
Note: Install and download figures in text descriptions reflect stats at the time of writing and may be outdated. All skill tables are live — they fetch current data from the ClawHub database on every page load. Treat table values as authoritative.
By the Numbers
| Metric | Value |
|---|---|
| Skills in this guide | 30+ |
| Workflow stages covered | 5 |
| Top skill by installs | debug-pro ( installs) |
| Top skill by downloads | debug-pro ( downloads) |
| Skills with install records | ~20 |
1. Debugging Protocols
Debugging without a protocol is just guessing. These skills encode structured debugging workflows — systematic root cause investigation, runtime trace analysis, and anti-give-up protocols for when the obvious fixes aren't working. debug-pro is the clear leader (49 installs, 9,671 downloads), implementing a 7-step protocol with language-specific commands. runesleo-systematic-debugging offers a complementary four-phase framework that forces root cause investigation before jumping to fixes.
2. Test Writing & TDD
The split between "test-writing" and "TDD" matters here. Most skills in this stage handle both, but the philosophy differs: test-writing skills help you add tests to existing code; TDD skills enforce the red-green-refactor discipline from the start. test-runner leads with 35 installs and covers TypeScript, Python, Go, and Rust in one skill. tdd-guide (7 installs) is the most comprehensive TDD implementation, generating tests first across Jest, Vitest, and Pytest.
3. E2E & Integration Testing
End-to-end testing is where most teams struggle — flaky selectors, timing issues, and environments that work locally but fail in CI. e2e-testing-patterns (10 installs) covers Playwright and Cypress with a focus on reliability patterns: deterministic selectors, proper wait strategies, and test isolation. Coverage analysis for smart contracts is a separate niche handled by sui-coverage (Sui Move).
4. Code Review Automation
Automated code review has two useful modes: reviewing PRs/diffs before merge, and enforcing standards during development. quack-code-review (13 installs) connects to LogicArt to find bugs and security issues in code. afrexai-code-reviewer focuses on enterprise PRs with multi-language support. receiving-code-review is the underrated one — it helps developers process feedback they've received rather than generate it, which is a gap most tools ignore. sonarqube-analyzer bridges to a self-hosted SonarQube instance for teams that already have that infrastructure.
5. Refactoring
Refactoring without tests is risky; refactoring without a plan is dangerous. Skills here implement the safety net: refactor-safely takes small steps with test checkpoints at each stage. jarvis-refactor-planner-01 generates the refactor plan (seams, rollback points) before touching any code. clean-code enforces pragmatic standards — no docstrings on every function, no over-engineering — which is a more useful guide than abstract principles.
Recommended Combinations
| Your situation | Recommended stack |
|---|---|
| Starting a new feature with TDD | tdd-guide + test-patterns |
| Debugging a hard-to-reproduce bug | debug-pro + runesleo-systematic-debugging |
| PR review before merge | afrexai-code-reviewer + clean-code-review |
| Adding E2E tests to an existing app | e2e-testing-patterns + test-runner |
| Safe refactor of legacy code | refactor-safely + jarvis-refactor-planner-01 |
| Processing code review feedback | receiving-code-review |
A Few Observations
Debugging dominates by install count. debug-pro has nearly 4× the installs of the next closest skill. Debugging is universal — every developer needs it regardless of language, framework, or project type — which makes it the category's clearest winner.
TDD has high downloads but modest installs. tdd-guide has 2,736 downloads but only 7 installs, suggesting many developers try TDD skills without committing to them long-term. This tracks with the broader industry pattern: TDD is widely admired, narrowly practiced.
The pua-debugging variants are surprisingly popular. Skills that use "corporate PUA rhetoric" to force exhaustive debugging (essentially guilt-tripping the agent into not giving up) have accumulated hundreds of downloads. They work by changing the agent's behavioral stance rather than its technical approach — an interesting signal that emotional framing affects AI debugging quality.
E2E coverage is thin. Only two skills focus specifically on E2E testing. Given how much pain E2E flakiness causes in practice, this is the most underserved area in the code quality ecosystem.
Data source: ClawHub platform install and download counts as of April 10, 2026. Visit clawhub-skills.com to search for more skills.