cut-your-tokens-97percent-savings-on-session-transcripts-via-observation-extractionClaw Compactor v6.0 — 50%+ savings through rule-based compression, dictionary encoding, session observation compression, and progressive context loading.
Install via ClawdBot CLI:
clawdbot install aeromomo/cut-your-tokens-97percent-savings-on-session-transcripts-via-observation-extraction"Cut your tokens. Keep your facts."
Cut your AI agent's token spend in half. One command compresses your entire workspace — memory files, session transcripts, sub-agent context — using 5 layered compression techniques. Deterministic. Mostly lossless. No LLM required.
full) runs everything in optimal order| # | Layer | Method | Savings | Lossless? |
|---|-------|--------|---------|-----------|
| 1 | Rule engine | Dedup lines, strip markdown filler, merge sections | 4-8% | ✅ |
| 2 | Dictionary encoding | Auto-learned codebook, $XX substitution | 4-5% | ✅ |
| 3 | Observation compression | Session JSONL → structured summaries | ~97% | ❌* |
| 4 | RLE patterns | Path shorthand ($WS), IP prefix, enum compaction | 1-2% | ✅ |
| 5 | Compressed Context Protocol | ultra/medium/light abbreviation | 20-60% | ❌* |
\*Lossy techniques preserve all facts and decisions; only verbose formatting is removed.
git clone https://github.com/aeromomo/claw-compactor.git
cd claw-compactor
# See how much you'd save (non-destructive)
python3 scripts/mem_compress.py /path/to/workspace benchmark
# Compress everything
python3 scripts/mem_compress.py /path/to/workspace full
Requirements: Python 3.9+. Optional: pip install tiktoken for exact token counts (falls back to heuristic).
┌─────────────────────────────────────────────────────────────┐
│ mem_compress.py │
│ (unified entry point) │
└──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬────┘
│ │ │ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
estimate compress dict dedup observe tiers audit optimize
└──────┴──────┴──┬───┴──────┴──────┴──────┴──────┘
▼
┌────────────────┐
│ lib/ │
│ tokens.py │ ← tiktoken or heuristic
│ markdown.py │ ← section parsing
│ dedup.py │ ← shingle hashing
│ dictionary.py │ ← codebook compression
│ rle.py │ ← path/IP/enum encoding
│ tokenizer_ │
│ optimizer.py │ ← format optimization
│ config.py │ ← JSON config
│ exceptions.py │ ← error types
└────────────────┘
All commands: python3 scripts/mem_compress.py
| Command | Description | Typical Savings |
|---------|-------------|-----------------|
| full | Complete pipeline (all steps in order) | 50%+ combined |
| benchmark | Dry-run performance report | — |
| compress | Rule-based compression | 4-8% |
| dict | Dictionary encoding with auto-codebook | 4-5% |
| observe | Session transcript → observations | ~97% |
| tiers | Generate L0/L1/L2 summaries | 88-95% on sub-agent loads |
| dedup | Cross-file duplicate detection | varies |
| estimate | Token count report | — |
| audit | Workspace health check | — |
| optimize | Tokenizer-level format fixes | 1-3% |
--json — Machine-readable JSON output--dry-run — Preview changes without writing--since YYYY-MM-DD — Filter sessions by date--auto-merge — Auto-merge duplicates (dedup)| Workspace State | Typical Savings | Notes |
|---|---|---|
| Session transcripts (observe) | ~97% | Megabytes of JSONL → concise observation MD |
| Verbose/new workspace | 50-70% | First run on unoptimized workspace |
| Regular maintenance | 10-20% | Weekly runs on active workspace |
| Already-optimized | 3-12% | Diminishing returns — workspace is clean |
Before compression runs, enable prompt caching for a 90% discount on cached tokens:
{
"models": {
"model-name": {
"cacheRetention": "long"
}
}
}
Compression reduces token count, caching reduces cost-per-token. Together: 50% compression + 90% cache discount = 95% effective cost reduction.
Run weekly or on heartbeat:
## Memory Maintenance (weekly)
- python3 skills/claw-compactor/scripts/mem_compress.py <workspace> benchmark
- If savings > 5%: run full pipeline
- If pending transcripts: run observe
Cron example:
0 3 * * 0 cd /path/to/skills/claw-compactor && python3 scripts/mem_compress.py /path/to/workspace full
Optional claw-compactor-config.json in workspace root:
{
"chars_per_token": 4,
"level0_max_tokens": 200,
"level1_max_tokens": 500,
"dedup_similarity_threshold": 0.6,
"dedup_shingle_size": 3
}
All fields optional — sensible defaults are used when absent.
| File | Purpose |
|------|---------|
| memory/.codebook.json | Dictionary codebook (must travel with memory files) |
| memory/.observed-sessions.json | Tracks processed transcripts |
| memory/observations/ | Compressed session summaries |
| memory/MEMORY-L0.md | Level 0 summary (~200 tokens) |
Q: Will compression lose my data?
A: Rule engine, dictionary, RLE, and tokenizer optimization are fully lossless. Observation compression and CCP are lossy but preserve all facts and decisions.
Q: How does dictionary decompression work?
A: decompress_text(text, codebook) expands all $XX codes back. The codebook JSON must be present.
Q: Can I run individual steps?
A: Yes. Every command is independent: compress, dict, observe, tiers, dedup, optimize.
Q: What if tiktoken isn't installed?
A: Falls back to a CJK-aware heuristic (chars÷4). Results are ~90% accurate.
Q: Does it handle Chinese/Japanese/Unicode?
A: Yes. Full CJK support including character-aware token estimation and Chinese punctuation normalization.
FileNotFoundError on workspace: Ensure path points to workspace root (contains memory/ or MEMORY.md)memory/.codebook.json exists and is valid JSONbenchmark: Workspace is already optimized — nothing to doobserve finds no transcripts: Check sessions directory for .jsonl filespip3 install tiktokenMIT
Generated Mar 1, 2026
A studio building multiple AI agents for clients needs to reduce token costs across all projects. Claw Compactor can compress session transcripts and memory files, cutting token spend by 50%+ and integrating with weekly maintenance workflows to keep costs low as agents scale.
A company uses AI agents to handle customer support chats, generating large volumes of session data. The observe layer compresses JSONL transcripts by ~97%, preserving key facts while reducing storage and processing costs for historical analysis and training.
Research teams employ AI agents to process and summarize academic papers or datasets, leading to verbose context files. Claw Compactor's tiered summaries and rule-based compression enable efficient progressive loading, saving 50-70% on token usage for large-scale projects.
An agency uses AI agents to generate and edit content across multiple clients, accumulating memory files and session logs. The tool's dictionary encoding and dedup features reduce redundancy by 4-8%, optimizing workspace size for faster iterations and lower operational costs.
IT departments deploy AI agents to monitor system logs and generate reports, producing extensive session transcripts. Claw Compactor's observation compression and RLE layers cut data volume by ~97% and 1-2% respectively, enabling cost-effective long-term retention and audit trails.
Offer Claw Compactor as a cloud-based service with tiered plans based on workspace size or compression volume. Revenue comes from monthly subscriptions, targeting AI developers and enterprises seeking predictable cost reduction without infrastructure management.
Provide professional services to integrate Claw Compactor into existing AI workflows, including custom configuration and automation setup. Revenue is generated through one-time project fees or ongoing support contracts, ideal for complex enterprise deployments.
Distribute the core tool as open source to build community adoption, while monetizing advanced features like real-time analytics, enhanced security, or priority support. Revenue streams include paid licenses or enterprise add-ons for large-scale users.
💬 Integration Tip
Integrate Claw Compactor into existing CI/CD pipelines by automating weekly runs with cron jobs, and combine it with prompt caching for up to 95% cost reduction.
Connect Claude to Clawdbot instantly and keep it connected 24/7. Run after setup to link your subscription, then auto-refreshes tokens forever.
ERC-8004 Trustless Agents - Register, discover, and build reputation for AI agents on Ethereum. Use when registering agents on-chain, querying agent registries, giving/receiving reputation feedback, or interacting with the AI agent trust layer.
Autonomous crypto trading on Base via Bankr. Use for trading tokens, monitoring launches, executing strategies, or managing a trading portfolio. Triggers on "trade", "buy", "sell", "launch", "snipe", "profit", "PnL", "portfolio balance", or any crypto trading task on Base.
Deploy ERC20 tokens on Base using Clanker SDK. Create tokens with built-in Uniswap V4 liquidity pools. Supports Base mainnet and Sepolia testnet. Requires PRIVATE_KEY in config.
Query DeFi portfolio data across 50+ chains via Zapper's GraphQL API. Use when the user wants to check wallet balances, DeFi positions, NFT holdings, token prices, or transaction history. Supports Base, Ethereum, Polygon, Arbitrum, Optimism, and more. Requires ZAPPER_API_KEY.
Interact with Solana blockchain via Helius APIs. Create/manage wallets, check balances (SOL + tokens), send transactions, swap tokens via Jupiter, and monitor addresses. Use for any Solana blockchain operation, crypto wallet management, token transfers, DeFi swaps, or portfolio tracking.