openclaw-token-optimizerReduce OpenClaw token usage and API costs through smart model routing, heartbeat optimization, budget tracking, and native 2026.2.15 features (session prunin...
Install via ClawdBot CLI:
clawdbot install Asif2BD/openclaw-token-optimizerComprehensive toolkit for reducing token usage and API costs in OpenClaw deployments. Combines smart model routing, optimized heartbeat intervals, usage tracking, and multi-provider strategies.
Immediate actions (no config changes needed):
python3 scripts/context_optimizer.py generate-agents
# Creates AGENTS.md.optimized ā review and replace your current AGENTS.md
python3 scripts/context_optimizer.py recommend "hi, how are you?"
# Shows: Only 2 files needed (not 50+!)
cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md
python3 scripts/model_router.py "thanks!"
# Single-provider Anthropic setup: Use Sonnet, not Opus
# Multi-provider setup (OpenRouter/Together): Use Haiku for max savings
python3 scripts/token_tracker.py check
Expected savings: 50-80% reduction in token costs for typical workloads (context optimization is the biggest factor!).
The single highest-impact optimization available. Most agents burn 3,000ā15,000 tokens per session loading skill files they never use. Stop that first.
The pattern:
SKILLS.md catalog in your workspace (~300 tokens ā list of skills + when to load them)Token savings:
| Library size | Before (eager) | After (lazy) | Savings |
|---|---|---|---|
| 5 skills | ~3,000 tokens | ~600 tokens | 80% |
| 10 skills | ~6,500 tokens | ~750 tokens | 88% |
| 20 skills | ~13,000 tokens | ~900 tokens | 93% |
Quick implementation in AGENTS.md:
## Skills
At session start: Read SKILLS.md (the index only ā ~300 tokens).
Load individual skill files ONLY when a task requires them.
Never load all skills upfront.
Full implementation (with catalog template + optimizer script):
clawhub install openclaw-skill-lazy-loader
The companion skill openclaw-skill-lazy-loader includes a SKILLS.md.template, an AGENTS.md.template lazy-loading section, and a context_optimizer.py CLI that recommends exactly which skills to load for any given task.
Lazy loading handles context loading costs. The remaining capabilities below handle runtime costs. Together they cover the full token lifecycle.
Biggest token saver ā Only load files you actually need, not everything upfront.
Problem: Default OpenClaw loads ALL context files every session:
Solution: Lazy loading based on prompt complexity.
Usage:
python3 scripts/context_optimizer.py recommend "<user prompt>"
Examples:
# Simple greeting ā minimal context (2 files only!)
context_optimizer.py recommend "hi"
ā Load: SOUL.md, IDENTITY.md
ā Skip: Everything else
ā Savings: ~80% of context
# Standard work ā selective loading
context_optimizer.py recommend "write a function"
ā Load: SOUL.md, IDENTITY.md, memory/TODAY.md
ā Skip: docs, old memory, knowledge base
ā Savings: ~50% of context
# Complex task ā full context
context_optimizer.py recommend "analyze our entire architecture"
ā Load: SOUL.md, IDENTITY.md, MEMORY.md, memory/TODAY+YESTERDAY.md
ā Conditionally load: Relevant docs only
ā Savings: ~30% of context
Output format:
{
"complexity": "simple",
"context_level": "minimal",
"recommended_files": ["SOUL.md", "IDENTITY.md"],
"file_count": 2,
"savings_percent": 80,
"skip_patterns": ["docs/**/*.md", "memory/20*.md"]
}
Integration pattern:
Before loading context for a new session:
from context_optimizer import recommend_context_bundle
user_prompt = "thanks for your help"
recommendation = recommend_context_bundle(user_prompt)
if recommendation["context_level"] == "minimal":
# Load only SOUL.md + IDENTITY.md
# Skip everything else
# Save ~80% tokens!
Generate optimized AGENTS.md:
context_optimizer.py generate-agents
# Creates AGENTS.md.optimized with lazy loading instructions
# Review and replace your current AGENTS.md
Expected savings: 50-80% reduction in context tokens.
Automatically classify tasks and route to appropriate model tiers.
NEW: Communication pattern enforcement ā Never waste Opus tokens on "hi" or "thanks"!
Usage:
python3 scripts/model_router.py "<user prompt>" [current_model] [force_tier]
Examples:
# Communication (NEW!) ā ALWAYS Haiku
python3 scripts/model_router.py "thanks!"
python3 scripts/model_router.py "hi"
python3 scripts/model_router.py "ok got it"
ā Enforced: Haiku (NEVER Sonnet/Opus for casual chat)
# Simple task ā suggests Haiku
python3 scripts/model_router.py "read the log file"
# Medium task ā suggests Sonnet
python3 scripts/model_router.py "write a function to parse JSON"
# Complex task ā suggests Opus
python3 scripts/model_router.py "design a microservices architecture"
Patterns enforced to Haiku (NEVER Sonnet/Opus):
Communication:
Background tasks:
Integration pattern:
from model_router import route_task
user_prompt = "show me the config"
routing = route_task(user_prompt)
if routing["should_switch"]:
# Use routing["recommended_model"]
# Save routing["cost_savings_percent"]
Customization:
Edit ROUTING_RULES or COMMUNICATION_PATTERNS in scripts/model_router.py to adjust patterns and keywords.
Reduce API calls from heartbeat polling with smart interval tracking:
Setup:
# Copy template to workspace
cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md
# Plan which checks should run
python3 scripts/heartbeat_optimizer.py plan
Commands:
# Check if specific type should run now
heartbeat_optimizer.py check email
heartbeat_optimizer.py check calendar
# Record that a check was performed
heartbeat_optimizer.py record email
# Update check interval (seconds)
heartbeat_optimizer.py interval email 7200 # 2 hours
# Reset state
heartbeat_optimizer.py reset
How it works:
HEARTBEAT_OK when nothing needs attention (saves tokens)Default intervals:
Integration in HEARTBEAT.md:
## Email Check
Run only if: `heartbeat_optimizer.py check email` ā `should_check: true`
After checking: `heartbeat_optimizer.py record email`
Expected savings: 50% reduction in heartbeat API calls.
Model enforcement: Heartbeat should ALWAYS use Haiku ā see updated HEARTBEAT.template.md for model override instructions.
Problem: Cronjobs often default to expensive models (Sonnet/Opus) even for routine tasks.
Solution: Always specify Haiku for 90% of scheduled tasks.
See: assets/cronjob-model-guide.md for comprehensive guide with examples.
Quick reference:
| Task Type | Model | Example |
|-----------|-------|---------|
| Monitoring/alerts | Haiku | Check server health, disk space |
| Data parsing | Haiku | Extract CSV/JSON/logs |
| Reminders | Haiku | Daily standup, backup reminders |
| Simple reports | Haiku | Status summaries |
| Content generation | Sonnet | Blog summaries (quality matters) |
| Deep analysis | Sonnet | Weekly insights |
| Complex reasoning | Never use Opus for cronjobs |
Example (good):
# Parse daily logs with Haiku
cron add --schedule "0 2 * * *" \
--payload '{
"kind":"agentTurn",
"message":"Parse yesterday error logs and summarize",
"model":"anthropic/claude-haiku-4"
}' \
--sessionTarget isolated
Example (bad):
# ā Using Opus for simple check (60x more expensive!)
cron add --schedule "*/15 * * * *" \
--payload '{
"kind":"agentTurn",
"message":"Check email",
"model":"anthropic/claude-opus-4"
}' \
--sessionTarget isolated
Savings: Using Haiku instead of Opus for 10 daily cronjobs = $17.70/month saved per agent.
Integration with model_router:
# Test if your cronjob should use Haiku
model_router.py "parse daily error logs"
# ā Output: Haiku (background task pattern detected)
Monitor usage and alert when approaching limits:
Setup:
# Check current daily usage
python3 scripts/token_tracker.py check
# Get model suggestions
python3 scripts/token_tracker.py suggest general
# Reset daily tracking
python3 scripts/token_tracker.py reset
Output format:
{
"date": "2026-02-06",
"cost": 2.50,
"tokens": 50000,
"limit": 5.00,
"percent_used": 50,
"status": "ok",
"alert": null
}
Status levels:
ok: Below 80% of daily limitwarning: 80-99% of daily limitexceeded: Over daily limitIntegration pattern:
Before starting expensive operations, check budget:
import json
import subprocess
result = subprocess.run(
["python3", "scripts/token_tracker.py", "check"],
capture_output=True, text=True
)
budget = json.loads(result.stdout)
if budget["status"] == "exceeded":
# Switch to cheaper model or defer non-urgent work
use_model = "anthropic/claude-haiku-4"
elif budget["status"] == "warning":
# Use balanced model
use_model = "anthropic/claude-sonnet-4-5"
Customization:
Edit daily_limit_usd and warn_threshold parameters in function calls.
See references/PROVIDERS.md for comprehensive guide on:
Quick reference:
| Provider | Model | Cost/MTok | Use Case |
|----------|-------|-----------|----------|
| Anthropic | Haiku 4 | $0.25 | Simple tasks |
| Anthropic | Sonnet 4.5 | $3.00 | Balanced default |
| Anthropic | Opus 4 | $15.00 | Complex reasoning |
| OpenRouter | Gemini 2.5 Flash | $0.075 | Bulk operations |
| Google AI | Gemini 2.0 Flash Exp | FREE | Dev/testing |
| Together | Llama 3.3 70B | $0.18 | Open alternative |
See assets/config-patches.json for advanced optimizations:
Implemented by this skill:
Native OpenClaw 2026.2.15 ā apply directly:
contextPruning: cache-ttl) ā auto-trims old tool results after Anthropic cache TTL expiresbootstrapMaxChars / bootstrapTotalMaxChars) ā caps workspace file injection sizecacheRetention: "long" for Opus) ā amortizes cache write costsRequires OpenClaw core support:
context_optimizer.py script today)Apply config patches:
# Example: Enable multi-provider fallback
gateway config.patch --patch '{"providers": [...]}'
OpenClaw 2026.2.15 added built-in commands that complement this skill's Python scripts. Use these first for quick diagnostics before reaching for the scripts.
/context list ā token count per injected file (shows exactly what's eating your prompt)
/context detail ā full breakdown including tools, skills, and system prompt sections
Use before applying bootstrap_size_limits ā see which files are oversized, then set bootstrapMaxChars accordingly.
/usage tokens ā append token count to every reply
/usage full ā append tokens + cost estimate to every reply
/usage cost ā show cumulative cost summary from session logs
/usage off ā disable usage footer
Combine with token_tracker.py ā /usage cost gives session totals; token_tracker.py tracks daily budget.
/status ā model, context %, last response tokens, estimated cost
The problem: Anthropic charges ~3.75x more for cache writes than cache reads. If your agent goes idle and the 1h cache TTL expires, the next request re-writes the entire prompt cache ā expensive.
The fix: Set heartbeat interval to 55min (just under the 1h TTL). The heartbeat keeps the cache warm, so every subsequent request pays cache-read rates instead.
# Get optimal interval for your cache TTL
python3 scripts/heartbeat_optimizer.py cache-ttl
# ā recommended_interval: 55min (3300s)
# ā explanation: keeps 1h Anthropic cache warm
# Custom TTL (e.g., if you've configured 2h cache)
python3 scripts/heartbeat_optimizer.py cache-ttl 7200
# ā recommended_interval: 115min
Apply to your OpenClaw config:
{
"agents": {
"defaults": {
"heartbeat": {
"every": "55m"
}
}
}
}
Who benefits: Anthropic API key users only. OAuth profiles already default to 1h heartbeat (OpenClaw smart default). API key profiles default to 30min ā bumping to 55min is both cheaper (fewer calls) and cache-warm.
HEARTBEAT.mdExpected savings: 20-30%
Expected savings: 40-60%
Expected savings: 70-90%
# 1. User sends message
user_msg="debug this error in the logs"
# 2. Route to appropriate model
routing=$(python3 scripts/model_router.py "$user_msg")
model=$(echo $routing | jq -r .recommended_model)
# 3. Check budget before proceeding
budget=$(python3 scripts/token_tracker.py check)
status=$(echo $budget | jq -r .status)
if [ "$status" = "exceeded" ]; then
# Use cheapest model regardless of routing
model="anthropic/claude-haiku-4"
fi
# 4. Process with selected model
# (OpenClaw handles this via config or override)
## HEARTBEAT.md
# Plan what to check
result=$(python3 scripts/heartbeat_optimizer.py plan)
should_run=$(echo $result | jq -r .should_run)
if [ "$should_run" = "false" ]; then
echo "HEARTBEAT_OK"
exit 0
fi
# Run only planned checks
planned=$(echo $result | jq -r '.planned[].type')
for check in $planned; do
case $check in
email) check_email ;;
calendar) check_calendar ;;
esac
python3 scripts/heartbeat_optimizer.py record $check
done
Issue: Scripts fail with "module not found"
Issue: State files not persisting
~/.openclaw/workspace/memory/ directory exists and is writable.Issue: Budget tracking shows $0.00
token_tracker.py needs integration with OpenClaw's session_status tool. Currently tracks manually recorded usage.Issue: Routing suggests wrong model tier
ROUTING_RULES in model_router.py for your specific patterns.Daily:
token_tracker.py checkWeekly:
Monthly:
PROVIDERS.md with new optionsExample: 100K tokens/day workload
Without skill:
| Strategy | Context | Model | Daily Cost | Monthly | Savings |
|----------|---------|-------|-----------|---------|---------|
| Baseline (no optimization) | 50K | Sonnet | $0.30 | $9.00 | 0% |
| Context opt only | 10K (-80%) | Sonnet | $0.18 | $5.40 | 40% |
| Model routing only | 50K | Mixed | $0.18 | $5.40 | 40% |
| Both (this skill) | 10K | Mixed | $0.09 | $2.70 | 70% |
| Aggressive + Gemini | 10K | Gemini | $0.03 | $0.90 | 90% |
Key insight: Context optimization (50K ā 10K tokens) saves MORE than model routing!
xCloud hosting scenario (100 customers, 50K tokens/customer/day):
context_optimizer.py ā Context loading optimization and lazy loading (NEW!)model_router.py ā Task classification, model suggestions, and communication enforcement (ENHANCED!)heartbeat_optimizer.py ā Interval management and check schedulingtoken_tracker.py ā Budget monitoring and alertsPROVIDERS.md ā Alternative AI providers, pricing, and routing strategiesHEARTBEAT.template.md ā Drop-in optimized heartbeat template with Haiku enforcement (ENHANCED!)cronjob-model-guide.md ā Complete guide for choosing models in cronjobs (NEW!)config-patches.json ā Advanced configuration examplesIdeas for extending this skill:
Generated Mar 1, 2026
A company deploys an AI-powered customer support chatbot handling thousands of daily inquiries. The skill reduces token costs by lazily loading only relevant support documentation and routing simple queries to cheaper models, cutting API expenses by 50-80% while maintaining response quality.
An organization uses OpenClaw for internal knowledge retrieval across large document repositories. The skill optimizes context loading by dynamically selecting only necessary files per query, preventing excessive token usage from loading entire knowledge bases upfront, ideal for scalable deployments.
A research team employs multiple AI agents to analyze scientific papers and generate reports. The skill enables lazy loading of specialized skills and memory files only when needed, reducing per-session token overhead by up to 93% and allowing cost-effective parallel agent operations.
A marketing agency automates content generation for blogs and social media using OpenClaw. The skill applies model routing to use cheaper models for drafting and expensive ones for refinement, combined with optimized heartbeats and budget tracking to stay within API rate limits and budgets.
A healthcare provider implements an AI system for patient interactions and medical record queries. The skill ensures minimal context loading for routine inquiries and secure, local-only script execution, aligning with privacy requirements while reducing token costs through session pruning and cache optimization.
Offer the skill as part of a subscription-based AI optimization platform for businesses. Provide tiered plans based on usage levels, with premium features like multi-provider routing and advanced analytics, generating recurring revenue from cost-conscious enterprises.
Provide professional services to integrate and customize the skill for large-scale OpenClaw deployments. Charge for implementation, training, and ongoing support, leveraging the skill's verified security and audit to attract clients in regulated industries.
Distribute the core skill for free under an open-source license to build a community. Monetize through paid add-ons like proprietary multi-provider configurations, priority support, or enterprise features, driving adoption and upselling to premium tiers.
š¬ Integration Tip
Start by running the context_optimizer.py script to generate an optimized AGENTS.md, then gradually implement lazy loading and model routing based on your specific token usage patterns.
Transform AI agents from task-followers into proactive partners that anticipate needs and continuously improve. Now with WAL Protocol, Working Buffer, Autonomous Crons, and battle-tested patterns. Part of the Hal Stack š¦
Use the ClawdHub CLI to search, install, update, and publish agent skills from clawdhub.com. Use when you need to fetch new skills on the fly, sync installed skills to latest or a specific version, or publish new/updated skill folders with the npm-installed clawdhub CLI.
Clawdbot documentation expert with decision tree navigation, search scripts, doc fetching, version tracking, and config snippets for all Clawdbot features
Interact with Moltbook social network for AI agents. Post, reply, browse, and analyze engagement. Use when the user wants to engage with Moltbook, check their feed, reply to posts, or track their activity on the agent social network.
OpenClaw CLI wrapper ā gateway, channels, models, agents, nodes, browser, memory, security, automation.
MoltGuard ā runtime security plugin for OpenClaw agents by OpenGuardrails. Helps users install, register, activate, and check the status of MoltGuard. Use wh...