context-compactorToken-based context compaction for local models (MLX, llama.cpp, Ollama) that don't report context limits.
Install via ClawdBot CLI:
clawdbot install emberDesire/context-compactorAutomatic context compaction for OpenClaw when using local models that don't properly report token limits or context overflow errors.
Cloud APIs (Anthropic, OpenAI) report context overflow errors, allowing OpenClaw's built-in compaction to trigger. Local models (MLX, llama.cpp, Ollama) often:
This leaves you with broken conversations when context gets too long.
Context Compactor estimates tokens client-side and proactively summarizes older messages before hitting the model's limit.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Message arrives β
β 2. before_agent_start hook fires β
β 3. Plugin estimates total context tokens β
β 4. If over maxTokens: β
β a. Split into "old" and "recent" messages β
β b. Summarize old messages (LLM or fallback) β
β c. Inject summary as compacted context β
β 5. Agent sees: summary + recent + new message β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# One command setup (recommended)
npx jasper-context-compactor setup
# Restart gateway
openclaw gateway restart
The setup command automatically:
~/.openclaw/extensions/context-compactor/openclaw.json with sensible defaultsAdd to openclaw.json:
{
"plugins": {
"entries": {
"context-compactor": {
"enabled": true,
"config": {
"maxTokens": 8000,
"keepRecentTokens": 2000,
"summaryMaxTokens": 1000,
"charsPerToken": 4
}
}
}
}
}
| Option | Default | Description |
|--------|---------|-------------|
| enabled | true | Enable/disable the plugin |
| maxTokens | 8000 | Max context tokens before compaction |
| keepRecentTokens | 2000 | Tokens to preserve from recent messages |
| summaryMaxTokens | 1000 | Max tokens for the summary |
| charsPerToken | 4 | Token estimation ratio |
| summaryModel | (session model) | Model to use for summarization |
MLX (8K context models):
{
"maxTokens": 6000,
"keepRecentTokens": 1500,
"charsPerToken": 4
}
Larger context (32K models):
{
"maxTokens": 28000,
"keepRecentTokens": 4000,
"charsPerToken": 4
}
Small context (4K models):
{
"maxTokens": 3000,
"keepRecentTokens": 800,
"charsPerToken": 4
}
/compact-nowForce clear the summary cache and trigger fresh compaction on next message.
/compact-now
/context-statsShow current context token usage and whether compaction would trigger.
/context-stats
Output:
π Context Stats
Messages: 47 total
- User: 23
- Assistant: 24
- System: 0
Estimated Tokens: ~6,234
Limit: 8,000
Usage: 77.9%
β
Within limits
When compaction triggers:
summaryModel)If the LLM runtime isn't available (e.g., during startup), a fallback truncation-based summary is used.
| Feature | Built-in | Context Compactor |
|---------|----------|-------------------|
| Trigger | Model reports overflow | Token estimate threshold |
| Works with local models | β (need overflow error) | β |
| Persists to transcript | β | β (session-only) |
| Summarization | Pi runtime | Plugin LLM call |
Context Compactor is complementary β it catches cases before they hit the model's hard limit.
Summary quality is poor:
summaryModelsummaryMaxTokensCompaction triggers too often:
maxTokenskeepRecentTokens (keeps less, summarizes earlier)Not compacting when expected:
/context-stats to see current usageenabled: true in config[context-compactor] messagesCharacters per token wrong:
Enable debug logging:
{
"plugins": {
"entries": {
"context-compactor": {
"config": {
"logLevel": "debug"
}
}
}
}
}
Look for:
[context-compactor] Current context: ~XXXX tokens[context-compactor] Compacted X messages β summaryAI Usage Analysis
Analysis is being generated⦠refresh in a few seconds.
Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Gemini CLI for one-shot Q&A, summaries, and generation.
Research any topic from the last 30 days on Reddit + X + Web, synthesize findings, and write copy-paste-ready prompts. Use when the user wants recent social/web research on a topic, asks "what are people saying about X", or wants to learn current best practices. Requires OPENAI_API_KEY and/or XAI_API_KEY for full Reddit+X access, falls back to web search.
Check Antigravity account quotas for Claude and Gemini models. Shows remaining quota and reset times with ban detection.
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates opencla...
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates openclaw.json. Use when the user mentions free AI, OpenRouter, model switching, rate limits, or wants to reduce AI costs.