b3ehiveRuns three AI agents in parallel to implement, cross-evaluate, score, and select the best code solution for a given coding task objectively.
Install via ClawdBot CLI:
clawdbot install weiyangzen/b3ehiveEnable competitive code generation where three isolated AI agents implement the same functionality, evaluate each other objectively, and deliver the optimal solution through data-driven selection.
assertions:
- final_solution/implementation exists and is runnable
- comparison_report.md exists with objective metrics
- decision_rationale.md explains selection logic
- all three agent implementations are documented
- evaluation scores are numeric and justified
graph TD
A[User Task] --> B[Phase 1: Parallel Spawn]
B --> C[Agent A: Simplicity]
B --> D[Agent B: Speed]
B --> E[Agent C: Robustness]
C --> F[Phase 2: Cross-Evaluation]
D --> F
E --> F
F --> G[6 Evaluation Reports]
G --> H[Phase 3: Self-Scoring]
H --> I[3 Scorecards]
I --> J[Phase 4: Final Delivery]
J --> K[Best Solution]
Agent Prompt Template:
role: "Expert Software Engineer"
focus: "{{agent_focus}}" # Simplicity / Speed / Robustness
task: "{{task_description}}"
constraints:
- Complete runnable code in implementation/
- Checklist.md with ALL items checked
- SUMMARY.md with competitive advantages
- Must differ from other agents' approaches
linter_rules:
- code_compiles: true
- tests_pass: true
- no_todos: true
- documented: true
assertions:
- implementation/main.* exists
- tests exist and pass
- Checklist.md is complete
- SUMMARY.md explains unique approach
Evaluation Prompt Template:
evaluator: "Agent {{from}}"
target: "Agent {{to}}"
task: "Objectively prove your solution is superior"
dimensions:
simplicity:
weight: 20
metrics:
- lines_of_code: count
- cyclomatic_complexity: calculate
- readability_score: 1-10
speed:
weight: 25
metrics:
- time_complexity: big_o
- space_complexity: big_o
- benchmark_results: run_if_possible
stability:
weight: 25
metrics:
- error_handling_coverage: percentage
- resource_cleanup: check
- fault_tolerance: test
corner_cases:
weight: 20
metrics:
- input_validation: comprehensive
- boundary_conditions: covered
- edge_cases: tested
maintainability:
weight: 10
metrics:
- documentation_quality: 1-10
- code_structure: logical
- extensibility: easy/hard
assertions:
- evaluation is objective with data
- specific code snippets cited
- numeric scores provided
- persuasion argument is data-driven
Scoring Prompt Template:
agent: "Agent {{name}}"
task: "Fairly score yourself and competitors"
self_evaluation:
- dimension: simplicity
max: 20
score: "{{self_score}}"
justification: "{{why}}"
- dimension: speed
max: 25
score: "{{self_score}}"
justification: "{{why}}"
- dimension: stability
max: 25
score: "{{self_score}}"
justification: "{{why}}"
- dimension: corner_cases
max: 20
score: "{{self_score}}"
justification: "{{why}}"
- dimension: maintainability
max: 10
score: "{{self_score}}"
justification: "{{why}}"
peer_evaluation:
- target: "Agent {{other}}"
scores: "{{numeric_scores}}"
comparison: "{{objective_comparison}}"
final_conclusion:
best_implementation: "[A/B/C/Mixed]"
reasoning: "{{data_driven_justification}}"
recommendation: "{{delivery_strategy}}"
assertions:
- all scores are numeric
- justifications are specific
- no inflation or bias
- conclusion is evidence-based
Decision Logic:
def select_winner(scores):
"""
Select final solution based on competitive scores
"""
margins = calculate_score_margins(scores)
if margins.winner - margins.second > 15:
# Clear winner
return SingleWinner(scores.winner)
elif margins.winner - margins.second > 5:
# Close competition, consider hybrid
return HybridSolution(scores.top_two)
else:
# Very close, pick simplest
return SimplestImplementation(scores.all)
assertions:
- final_solution is runnable
- comparison_report explains all approaches
- decision_rationale is transparent
- attribution is given to winning agent
workspace/
āāā run_a/
ā āāā implementation/ # Agent A code
ā āāā Checklist.md # Completion checklist
ā āāā SUMMARY.md # Approach summary
ā āāā evaluation/ # Evaluations of B, C
ā āāā SCORECARD.md # Self-scoring
āāā run_b/ # Same structure
āāā run_c/ # Same structure
āāā final/ # Winning solution
āāā COMPARISON_REPORT.md # Full analysis
āāā DECISION_RATIONALE.md # Why winner selected
- [x] checkboxes#!/bin/bash
# scripts/lint.sh
lint_agent_output() {
local agent_dir="$1"
local errors=0
# Check required files exist
for file in Checklist.md SUMMARY.md implementation/main.*; do
if [[ ! -f "${agent_dir}/${file}" ]]; then
echo "ERROR: Missing ${file}"
((errors++))
fi
done
# Check Checklist is complete
if grep -q "\[ \]" "${agent_dir}/Checklist.md"; then
echo "ERROR: Checklist has unchecked items"
((errors++))
fi
# Check code compiles (language-specific)
# ... implementation-specific checks
return $errors
}
# Run on all agents
for agent in a b c; do
lint_agent_output "workspace/run_${agent}" || exit 1
done
def assert_phase_complete(phase_name):
"""Assert that a phase has completed successfully"""
assertions = {
"phase1": [
"workspace/run_a/implementation exists",
"workspace/run_b/implementation exists",
"workspace/run_c/implementation exists",
"All Checklist.md are complete"
],
"phase2": [
"6 evaluation reports exist",
"All evaluations have numeric scores"
],
"phase3": [
"3 scorecards exist",
"All scores are numeric",
"Conclusions are provided"
],
"phase4": [
"final/solution exists",
"COMPARISON_REPORT.md exists",
"DECISION_RATIONALE.md exists"
]
}
for assertion in assertions[phase_name]:
assert evaluate(assertion), f"Assertion failed: {assertion}"
b3ehive:
# Agent configuration
agents:
count: 3
model: openai-proxy/gpt-5.3-codex
thinking: high
focuses:
- simplicity
- speed
- robustness
# Evaluation weights (must sum to 100)
evaluation:
dimensions:
simplicity: 20
speed: 25
stability: 25
corner_cases: 20
maintainability: 10
# Delivery strategy
delivery:
strategy: auto # auto / best / hybrid
threshold: 15 # Point margin for clear winner
# Quality gates
quality:
lint: true
test: true
coverage_threshold: 80
# Basic usage
b3ehive "Implement a thread-safe rate limiter"
# With constraints
b3ehive "Implement quicksort" --lang python --max-lines 50
# Using OpenClaw CLI
openclaw skills run b3ehive --task "Your task"
MIT Ā© Weiyang (@weiyangzen)
Generated Mar 1, 2026
Financial firms need to develop and optimize trading algorithms under strict performance constraints. B3ehive can generate three competing implementations (simple, fast, robust) for backtesting, evaluate them objectively, and select the most reliable solution for deployment.
Security teams require efficient code to scan networks for vulnerabilities with minimal false positives. B3ehive produces multiple scanning approaches, cross-evaluates them for speed and accuracy, and delivers a robust solution that balances detection rates with resource usage.
Online retailers need personalized recommendation systems that handle high traffic and diverse user data. B3ehive creates implementations focused on simplicity, speed, and robustness, then selects the optimal engine based on performance metrics and maintainability for integration into production.
Medical institutions process sensitive patient data with requirements for accuracy and compliance. B3ehive generates three pipeline variants, evaluates them for stability and error handling, and chooses the most reliable implementation to ensure data integrity and regulatory adherence.
Offer B3ehive as a cloud-based service where users submit coding tasks via API. Charge subscription tiers based on usage volume and complexity, providing automated reports and integration with development tools for continuous code optimization.
License B3ehive to large organizations for internal use in software development teams. Provide custom training, support, and integration services, with pricing based on the number of developers or projects to enhance code quality and reduce review time.
Deploy B3ehive as part of consulting engagements to help clients solve specific coding challenges. Offer tailored implementations and analysis reports, charging per project or on a retainer basis for ongoing optimization and competitive benchmarking.
š¬ Integration Tip
Integrate B3ehive into CI/CD pipelines to automatically generate and evaluate code variants during development, ensuring only the best solutions are deployed.
Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.
Provides a 7-step debugging protocol plus language-specific commands to systematically identify, verify, and fix software bugs across multiple environments.
A comprehensive skill for using the Cursor CLI agent for various software engineering tasks (updated for 2026 features, includes tmux automation guide).
Write, run, and manage unit, integration, and E2E tests across TypeScript, Python, and Swift using recommended frameworks.
Control and operate Opencode via slash commands. Use this skill to manage sessions, select models, switch agents (plan/build), and coordinate coding through Opencode.
Coding style memory that adapts to your preferences, conventions, and patterns for consistent coding.