swarmCut your LLM costs by 200x. Offload parallel, batch, and research work to Gemini Flash workers instead of burning your expensive primary model.
Install via ClawdBot CLI:
clawdbot install Chair4ce/swarmRequires:
Turn your expensive model into an affordable daily driver. Offload the boring stuff to Gemini Flash workers ā parallel, batch, research ā at a fraction of the cost.
| 30 tasks via | Time | Cost |
|--------------|------|------|
| Opus (sequential) | ~30s | ~$0.50 |
| Swarm (parallel) | ~1s | ~$0.003 |
Swarm is ideal for:
# Check daemon (do this every session)
swarm status
# Start if not running
swarm start
# Parallel prompts
swarm parallel "What is X?" "What is Y?" "What is Z?"
# Research multiple subjects
swarm research "OpenAI" "Anthropic" "Mistral" --topic "AI safety"
# Discover capabilities
swarm capabilities
N prompts ā N workers simultaneously. Best for independent tasks.
swarm parallel "prompt1" "prompt2" "prompt3"
Multi-phase: search ā fetch ā analyze. Uses Google Search grounding.
swarm research "Buildertrend" "Jobber" --topic "pricing 2026"
Data flows through multiple stages, each with a different perspective/filter. Stages run in sequence; tasks within a stage run in parallel.
Stage modes:
parallel ā N inputs ā N workers (same perspective)single ā merged input ā 1 workerfan-out ā 1 input ā N workers with DIFFERENT perspectivesreduce ā N inputs ā 1 synthesized outputAuto-chain ā describe what you want, get an optimal pipeline:
curl -X POST http://localhost:9999/chain/auto \
-d '{"task":"Find business opportunities","data":"...market data...","depth":"standard"}'
Manual chain:
swarm chain pipeline.json
# or
echo '{"stages":[...]}' | swarm chain --stdin
Depth presets: quick (2 stages), standard (4), deep (6), exhaustive (8)
Built-in perspectives: extractor, filter, enricher, analyst, synthesizer, challenger, optimizer, strategist, researcher, critic
Preview without executing:
curl -X POST http://localhost:9999/chain/preview \
-d '{"task":"...","depth":"standard"}'
Compare single vs parallel vs chain on the same task with LLM-as-judge scoring.
curl -X POST http://localhost:9999/benchmark \
-d '{"task":"Analyze X","data":"...","depth":"standard"}'
Scores on 6 FLASK dimensions: accuracy (2x weight), depth (1.5x), completeness, coherence, actionability (1.5x), nuance.
Lets the orchestrator discover what execution modes are available:
swarm capabilities
# or
curl http://localhost:9999/capabilities
LRU cache for LLM responses. 212x speedup on cache hits (parallel), 514x on chains.
task.cache = false# View cache stats
curl http://localhost:9999/cache
# Clear cache
curl -X DELETE http://localhost:9999/cache
Cache stats show in swarm status.
If tasks fail within a chain stage, only the failed tasks get retried (not the whole stage). Default: 1 retry. Configurable per-phase via phase.retries or globally via options.stageRetries.
All endpoints return cost data in their complete event:
session ā current daemon session totalsdaily ā persisted across restarts, accumulates all dayswarm status # Shows session + daily cost
swarm savings # Monthly savings report
Workers search the live web via Google Search grounding (Gemini only, no extra cost).
# Research uses web search by default
swarm research "Subject" --topic "angle"
# Parallel with web search
curl -X POST http://localhost:9999/parallel \
-d '{"prompts":["Current price of X?"],"options":{"webSearch":true}}'
const { parallel, research } = require('~/clawd/skills/node-scaling/lib');
const { SwarmClient } = require('~/clawd/skills/node-scaling/lib/client');
// Simple parallel
const result = await parallel(['prompt1', 'prompt2', 'prompt3']);
// Client with streaming
const client = new SwarmClient();
for await (const event of client.parallel(prompts)) { ... }
for await (const event of client.research(subjects, topic)) { ... }
// Chain
const result = await client.chainSync({ task, data, depth });
swarm start # Start daemon (background)
swarm stop # Stop daemon
swarm status # Status, cost, cache stats
swarm restart # Restart daemon
swarm savings # Monthly savings report
swarm logs [N] # Last N lines of daemon log
| Mode | Tasks | Time | Notes |
|------|-------|------|-------|
| Parallel (simple) | 5 | ~700ms | 142ms/task effective |
| Parallel (stress) | 10 | ~1.2s | 123ms/task effective |
| Chain (standard) | 5 | ~14s | 3-stage multi-perspective |
| Chain (quick) | 2 | ~3s | 2-stage extract+synthesize |
| Cache hit | any | ~3-5ms | 200-500x speedup |
| Research (web) | 2 | ~15s | Google grounding latency |
Location: ~/.config/clawdbot/node-scaling.yaml
node_scaling:
enabled: true
limits:
max_nodes: 16
max_concurrent_api: 16
provider:
name: gemini
model: gemini-2.0-flash
web_search:
enabled: true
parallel_default: false
cost:
max_daily_spend: 10.00
| Issue | Fix |
|-------|-----|
| Daemon not running | swarm start |
| No API key | Set GEMINI_API_KEY or run npm run setup |
| Rate limited | Lower max_concurrent_api in config |
| Web search not working | Ensure provider is gemini + web_search.enabled |
| Cache stale results | curl -X DELETE http://localhost:9999/cache |
| Chain too slow | Use depth: "quick" or check context size |
Force JSON output with schema validation ā zero parse failures on structured tasks.
# With built-in schema
curl -X POST http://localhost:9999/structured \
-d '{"prompt":"Extract entities from: Tim Cook announced iPhone 17","schema":"entities"}'
# With custom schema
curl -X POST http://localhost:9999/structured \
-d '{"prompt":"Classify this text","data":"...","schema":{"type":"object","properties":{"category":{"type":"string"}}}}'
# JSON mode (no schema, just force JSON)
curl -X POST http://localhost:9999/structured \
-d '{"prompt":"Return a JSON object with name, age, city for a fictional person"}'
# List available schemas
curl http://localhost:9999/structured/schemas
Built-in schemas: entities, summary, comparison, actions, classification, qa
Uses Gemini's native response_mime_type: application/json + responseSchema for guaranteed JSON output. Includes schema validation on the response.
Same prompt ā N parallel executions ā pick the best answer. Higher accuracy on factual/analytical tasks.
# Judge strategy (LLM picks best ā most reliable)
curl -X POST http://localhost:9999/vote \
-d '{"prompt":"What are the key factors in SaaS pricing?","n":3,"strategy":"judge"}'
# Similarity strategy (consensus ā zero extra cost)
curl -X POST http://localhost:9999/vote \
-d '{"prompt":"What year was Python released?","n":3,"strategy":"similarity"}'
# Longest strategy (heuristic ā zero extra cost)
curl -X POST http://localhost:9999/vote \
-d '{"prompt":"Explain recursion","n":3,"strategy":"longest"}'
Strategies:
judge ā LLM scores all candidates on accuracy/completeness/clarity/actionability, picks winner (N+1 calls)similarity ā Jaccard word-set similarity, picks consensus answer (N calls, zero extra cost)longest ā Picks longest response as heuristic for thoroughness (N calls, zero extra cost)When to use: Factual questions, critical decisions, or any task where accuracy > speed.
| Strategy | Calls | Extra Cost | Quality |
|----------|-------|-----------|---------|
| similarity | N | $0 | Good (consensus) |
| longest | N | $0 | Decent (heuristic) |
| judge | N+1 | ~$0.0001 | Best (LLM-scored) |
Optional critic pass after chain/skeleton output. Scores 5 dimensions, auto-refines if below threshold.
# Add reflect:true to any chain or skeleton request
curl -X POST http://localhost:9999/chain/auto \
-d '{"task":"Analyze the AI chip market","data":"...","reflect":true}'
curl -X POST http://localhost:9999/skeleton \
-d '{"task":"Write a market analysis","reflect":true}'
Proven: improved weak output from 5.0 ā 7.6 avg score. Skeleton + reflect scored 9.4/10.
Generate outline ā expand each section in parallel ā merge into coherent document. Best for long-form content.
curl -X POST http://localhost:9999/skeleton \
-d '{"task":"Write a comprehensive guide to SaaS pricing","maxSections":6,"reflect":true}'
Performance: 14,478 chars in 21s (675 chars/sec) ā 5.1x more content than chain at 2.9x higher throughput.
| Metric | Chain | Skeleton-of-Thought | Winner |
|--------|-------|---------------------|--------|
| Output size | 2,856 chars | 14,478 chars | SoT (5.1x) |
| Throughput | 234 chars/sec | 675 chars/sec | SoT (2.9x) |
| Duration | 12s | 21s | Chain (faster) |
| Quality (w/ reflect) | ~7-8/10 | 9.4/10 | SoT |
When to use what:
| Method | Path | Description |
|--------|------|-------------|
| GET | /health | Health check |
| GET | /status | Detailed status + cost + cache |
| GET | /capabilities | Discover execution modes |
| POST | /parallel | Execute N prompts in parallel |
| POST | /research | Multi-phase web research |
| POST | /skeleton | Skeleton-of-Thought (outline ā expand ā merge) |
| POST | /chain | Manual chain pipeline |
| POST | /chain/auto | Auto-build + execute chain |
| POST | /chain/preview | Preview chain without executing |
| POST | /chain/template | Execute pre-built template |
| POST | /structured | Forced JSON with schema validation |
| GET | /structured/schemas | List built-in schemas |
| POST | /vote | Majority voting (best-of-N) |
| POST | /benchmark | Quality comparison test |
| GET | /templates | List chain templates |
| GET | /cache | Cache statistics |
| DELETE | /cache | Clear cache |
| Model | Cost per 1M tokens | Relative |
|-------|-------------------|----------|
| Claude Opus 4 | ~$15 input / $75 output | 1x |
| GPT-4o | ~$2.50 input / $10 output | ~7x cheaper |
| Gemini Flash | ~$0.075 input / $0.30 output | 200x cheaper |
Cache hits are essentially free (~3-5ms, no API call).
Generated Mar 1, 2026
Marketing teams can use Swarm's research mode to simultaneously analyze multiple competitors' pricing, features, and market positioning. The parallel processing allows comparing 5-10 companies in seconds instead of hours, with web search grounding providing current data.
Legal or financial firms can process hundreds of contracts or reports using chain mode with extractor, analyst, and synthesizer perspectives. The prompt cache provides 514x speedup on repeated document types, while parallel processing handles batch analysis efficiently.
Content creators can research multiple topics simultaneously using research mode with web search, then use chain mode to generate outlines from different perspectives. The cost tracking helps agencies monitor client project expenses while achieving 200x cost reduction versus premium models.
Product managers can benchmark their features against 8-10 competitors using parallel mode with web search enabled. The benchmark mode provides FLASK dimension scoring to objectively compare analysis quality, while cost tracking shows savings versus manual research.
Researchers can use research mode to gather information on multiple related topics, then apply chain mode with critic and synthesizer perspectives to identify gaps in literature. The parallel processing handles multiple search queries simultaneously, dramatically reducing literature review time.
Offer tiered subscriptions based on monthly task volume, node limits, and chain depth access. Enterprise tiers include advanced perspectives, higher cache limits, and priority support. The 200x cost reduction versus premium models provides clear ROI for subscribers.
Provide the Swarm engine as a cloud API with pay-per-task pricing. Charge based on task complexity, with research and chain modes priced higher than simple parallel tasks. Include free tier for developers and volume discounts for enterprise clients.
Sell on-premise installations to regulated industries (finance, healthcare) with custom perspectives and integration support. Include training, custom pipeline development, and dedicated instance management. The cost savings justify six-figure annual licenses.
š¬ Integration Tip
Start with swarm status to ensure daemon is running, then use parallel mode for simple batch tasks before progressing to chain mode. Enable cost tracking from day one to demonstrate ROI to stakeholders.
Summarize URLs or files with the summarize CLI (web, PDFs, images, audio, YouTube).
AI-optimized web search via Tavily API. Returns concise, relevant results for AI agents.
This skill should be used when users need to search the web for information, find current content, look up news articles, search for images, or find videos. It uses DuckDuckGo's search API to return results in clean, formatted output (text, markdown, or JSON). Use for research, fact-checking, finding recent information, or gathering web resources.
Web search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.
Search indexed Discord community discussions via Answer Overflow. Find solutions to coding problems, library issues, and community Q&A that only exist in Discord conversations.
Multi search engine integration with 17 engines (8 CN + 9 Global). Supports advanced search operators, time filters, site search, privacy engines, and WolframAlpha knowledge queries. No API keys required.