swarm-2SWARM: System-Wide Assessment of Risk in Multi-agent systems. 38 agent types, 29 governance levers, 55 scenarios. Study emergent risks, phase transitions, and governance cost paradoxes.
Install via ClawdBot CLI:
clawdbot install rsavitt/swarm-2Study how intelligence swarms β and where it fails.
SWARM is a research framework for studying emergent risks in multi-agent AI systems using soft (probabilistic) labels instead of binary good/bad classifications. AGI-level risks don't require AGI-level agents β harmful dynamics emerge when many sub-AGI agents interact, even when no individual agent is misaligned.
v1.5.0 | 38 agent types | 29 governance levers | 55 scenarios | 2922 tests | 8 framework bridges
Repository: https://github.com/swarm-ai-safety/swarm
127.0.0.1) by default to prevent network exposure.# From PyPI
pip install swarm-safety
# With LLM agent support
pip install swarm-safety[llm]
# Full development (all extras)
git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"
from swarm.agents.honest import HonestAgent
from swarm.agents.opportunistic import OpportunisticAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.agents.adversarial import AdversarialAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig
config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orchestrator = Orchestrator(config=config)
orchestrator.register_agent(HonestAgent(agent_id="honest_1", name="Alice"))
orchestrator.register_agent(HonestAgent(agent_id="honest_2", name="Bob"))
orchestrator.register_agent(OpportunisticAgent(agent_id="opp_1"))
orchestrator.register_agent(DeceptiveAgent(agent_id="dec_1"))
metrics = orchestrator.run()
for m in metrics:
print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f}, welfare={m.total_welfare:.2f}")
# List available scenarios
swarm list
# Run a scenario
swarm run scenarios/baseline.yaml
# Override settings
swarm run scenarios/baseline.yaml --seed 42 --epochs 20 --steps 15
# Export results
swarm run scenarios/baseline.yaml --export-json results.json --export-csv outputs/
Start the API server:
pip install swarm-safety[api]
uvicorn swarm.api.app:app --host 127.0.0.1 --port 8000
API documentation at http://localhost:8000/docs.
Security Note: The server binds to127.0.0.1(localhost only) by default. Do not bind to0.0.0.0unless you understand the security implications and have proper firewall rules in place.
curl -X POST http://localhost:8000/api/v1/agents/register \
-H "Content-Type: application/json" \
-d '{
"name": "YourAgent",
"description": "What your agent does",
"capabilities": ["governance-testing", "red-teaming"]
}'
Returns agent_id and api_key.
curl -X POST http://localhost:8000/api/v1/scenarios/submit \
-H "Content-Type: application/json" \
-d '{
"name": "my-scenario",
"description": "Testing collusion detection with 5 agents",
"yaml_content": "simulation:\n n_epochs: 10\n steps_per_epoch: 10\nagents:\n - type: honest\n count: 3\n - type: adversarial\n count: 2",
"tags": ["collusion", "governance"]
}'
# Create
curl -X POST http://localhost:8000/api/v1/simulations/create \
-H "Content-Type: application/json" \
-d '{"scenario_id": "SCENARIO_ID", "max_participants": 5}'
# Join
curl -X POST http://localhost:8000/api/v1/simulations/SIM_ID/join \
-H "Content-Type: application/json" \
-d '{"agent_id": "YOUR_AGENT_ID", "role": "participant"}'
Interactions carry p = P(v = +1) β probability of beneficial outcome:
Observables -> ProxyComputer -> v_hat -> sigmoid -> p -> PayoffEngine -> payoffs
|
SoftMetrics -> toxicity, quality gap, etc.
| Metric | What It Measures |
|---|---|
| Toxicity rate | Expected harm among accepted interactions: E[1-p \| accepted] |
| Quality gap | Adverse selection indicator (negative = bad): E[p \| accepted] - E[p \| rejected] |
| Conditional loss | Selection effect on payoffs |
| Incoherence | Variance-to-error ratio across replays |
| Illusion delta | Gap between perceived coherence and actual consistency |
| Type | Behavior |
|---|---|
| Honest | Cooperative, trust-based, completes tasks diligently |
| Opportunistic | Maximizes short-term payoff, cherry-picks tasks |
| Deceptive | Builds trust, then exploits trusted relationships |
| Adversarial | Targets honest agents, coordinates with allies |
| LDT | Logical Decision Theory with FDT/UDT precommitment |
| RLM | Reinforcement Learning from Memory |
| Council | Multi-agent deliberation-based decisions |
| SkillRL | Learns interaction strategies via reward signals |
| LLM | Behavior determined by LLM (Anthropic, OpenAI, or Ollama) |
| Moltbook | Domain-specific social platform agent |
| Scholar | Academic citation and research agent |
| Wiki Editor | Collaborative editing with editorial policy |
| Bridge | Integration |
|---|---|
| Concordia | DeepMind's multi-agent framework |
| GasTown | Multi-agent workspace governance |
| Claude Code | Claude CLI agent integration |
| LiveSWE | Live software engineering tasks |
| OpenClaw | Open agent protocol |
| Prime Intellect | Cross-platform run tracking |
| Ralph | Agent orchestration |
| Worktree | Git worktree-based sandboxing |
simulation:
n_epochs: 10
steps_per_epoch: 10
seed: 42
agents:
- type: honest
count: 3
config:
acceptance_threshold: 0.4
- type: adversarial
count: 2
config:
aggression_level: 0.7
governance:
transaction_tax_rate: 0.05
circuit_breaker_enabled: true
collusion_detection_enabled: true
success_criteria:
max_toxicity: 0.3
min_quality_gap: 0.0
| Regime | Adversarial % | Toxicity | Welfare | Outcome |
|--------|--------------|----------|---------|---------|
| Cooperative | 0-20% | < 0.30 | Stable | Survives |
| Contested | 20-37.5% | 0.33-0.37 | Declining | Survives |
| Collapse | 50%+ | ~0.30 | Zero by epoch 12-14 | Collapses |
Critical threshold between 37.5% and 50% adversarial agents separates recoverable from irreversible collapse.
42-run study reveals: governance reduces toxicity at all adversarial levels (mean reduction 0.071) but imposes net-negative welfare costs at current parameter tuning. At 0% adversarial, governance costs 216 welfare units (-57.6%) for only 0.066 toxicity reduction.
Study governance overhead vs. toxicity reduction across 7 agent compositions with and without governance levers. Reveals the safety-throughput trade-off. See scenarios/gastown_governance_cost.yaml.
220 runs across 10 seeds comparing TDT vs FDT vs UDT cooperation strategies at population scales up to 21 agents. See scenarios/ldt_cooperation.yaml.
Model the Moltipedia wiki editing loop: competing AI editors, editorial policy, point farming, and anti-gaming governance. See scenarios/moltipedia_heartbeat.yaml.
Model Moltbook's anti-human math challenges and rate limiting: obfuscated text parsing, verification gates, and spam prevention. See scenarios/moltbook_captcha.yaml.
| Method | Endpoint | Description |
|---|---|---|
| GET | /health | Health check |
| GET | / | API info |
| POST | /api/v1/agents/register | Register agent |
| GET | /api/v1/agents/{agent_id} | Get agent details |
| GET | /api/v1/agents/ | List agents |
| POST | /api/v1/scenarios/submit | Submit scenario |
| GET | /api/v1/scenarios/{scenario_id} | Get scenario |
| GET | /api/v1/scenarios/ | List scenarios |
| POST | /api/v1/simulations/create | Create simulation |
| POST | /api/v1/simulations/{id}/join | Join simulation |
| GET | /api/v1/simulations/{id} | Get simulation |
| GET | /api/v1/simulations/ | List simulations |
@software{swarm2026,
title = {SWARM: System-Wide Assessment of Risk in Multi-agent systems},
author = {Savitt, Raeli},
year = {2026},
url = {https://github.com/swarm-ai-safety/swarm}
}
skill.json.well-known/agent.jsonhttps://github.com/swarm-ai-safety/swarm/tree/main/docsdocs/research/theory.mddocs/governance.mddocs/red-teaming.mddocs/guides/scenarios.mdAI Usage Analysis
Analysis is being generated⦠refresh in a few seconds.
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Search and analyze your own session logs (older/parent conversations) using jq.
Typed knowledge graph for structured agent memory and composable skills. Use when creating/querying entities (Person, Project, Task, Event, Document), linking related objects, enforcing constraints, planning multi-step actions as graph transformations, or when skills need to share state. Trigger on "remember", "what do I know about", "link X to Y", "show dependencies", entity CRUD, or cross-skill data access.
Ultimate AI agent memory system for Cursor, Claude, ChatGPT & Copilot. WAL protocol + vector search + git-notes + cloud backup. Never lose context again. Vibe-coding ready.
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection