Memory Hygiene: Keep Clawdbot's LanceDB Vector Memory Clean and Fast
11,600+ downloads — the Memory Hygiene Skill is the maintenance tool for Clawdbot's long-term memory system. Clawdbot uses LanceDB — an embedded vector database — to store conversation history and contextual embeddings for auto-recall. Over time, this database accumulates stale, redundant, and low-quality entries that degrade recall performance and bloat storage. Memory Hygiene audits, prunes, and reseeds this database to keep memory working well.
The Problem It Solves
Clawdbot's memory system is designed to help you: it captures important context from conversations and resurfaces it when relevant. But like any database that only ever grows, LanceDB will eventually:
- Accumulate noise — low-relevance fragments from old conversations that match queries and crowd out useful results
- Grow in size — unbounded growth slows down vector similarity searches
- Contain contradictions — if you've changed how you work or what you prefer, old entries can override newer, more accurate ones
- Drift from reality — information you stored six months ago about "current projects" is now outdated
Memory Hygiene solves this by giving you visibility into what's in LanceDB and the tools to clean it up.
Understanding Clawdbot's LanceDB Memory
Before running Memory Hygiene, it helps to understand the storage architecture:
~/.clawdbot/
memory/
lancedb/ # Vector database files
conversation_history/ # Raw conversation turns
embeddings/ # Semantic embedding store
index/ # Vector similarity index
Each stored memory is a vector embedding — a numerical representation of text that captures semantic meaning. When Clawdbot needs to recall relevant context, it performs a vector similarity search against this database.
The quality of recall depends on:
- Index freshness — outdated entries reduce signal-to-noise ratio
- Deduplication — near-duplicate entries waste space and dilute results
- Relevance scoring — entries from deeply contextual conversations are more valuable than throwaway interactions
Core Operations
Memory Audit
Audit my Clawdbot memory database
The skill analyzes the current LanceDB state and reports:
- Total number of stored embeddings
- Age distribution (entries from last week, month, 3 months, 1 year+)
- Estimated storage size
- Duplicate or near-duplicate entry count
- Low-confidence entries (stored from weak signals)
Full Wipe and Reseed
For severe cases where the database is dominated by noise, a full wipe is faster than incremental cleanup:
Wipe my LanceDB memory and reseed from my MEMORY.md
The skill:
- Deletes
~/.clawdbot/memory/lancedb/ - Restarts the Gateway daemon to clear in-memory state
- Re-reads your curated
MEMORY.mdfile - Reseeds LanceDB with only the facts you've explicitly preserved
This "wipe and reseed" approach treats your handcrafted MEMORY.md as the source of truth — a fundamentally different mental model from incremental cleanup. It's the right choice when auto-capture has filled the DB with transient noise (heartbeat entries, test conversations, etc.).
Clean Stale Entries
Clean up old memory entries I no longer need
Remove memory entries older than 3 months
The skill removes entries below a relevance or age threshold. You can configure:
- Age cutoff (default: entries older than 90 days)
- Relevance score threshold
- Specific conversation sessions to purge
Deduplicate
Remove duplicate memory entries
Near-duplicate embeddings (vector similarity > 0.95) are merged or pruned, keeping the most recent or highest-confidence version.
Reseed High-Value Memories
Reseed my memory with the most important context from recent conversations
After cleaning, the skill can re-index recent conversations with fresh embeddings, ensuring the highest-quality recent context is well-represented in the database.
Disable autoCapture
A key configuration option that Memory Hygiene helps manage:
Disable autoCapture for my memory system
autoCapture is Clawdbot's setting that automatically stores conversation fragments to LanceDB. When enabled indiscriminately, it captures everything — including low-value exchanges that add noise. Memory Hygiene recommends disabling autoCapture and instead using explicit memory commands (/remember this, @store) to ensure only high-value context is stored.
// Recommended configuration (set in clawdbot config):
{
"memory": {
"autoCapture": false,
"explicitOnly": true
}
}The Monthly Maintenance Workflow
Memory Hygiene is designed to be run as a monthly cron job:
Set up a monthly memory hygiene cron job
# Manual equivalent:
clawdbot cron add \
--name "Monthly Memory Hygiene" \
--cron "0 3 1 * *" \
--session isolated \
--message "Run memory hygiene: audit, remove entries older than 60 days, deduplicate, and reseed from last month's high-value conversations. Report what was cleaned."This runs on the 1st of each month at 3 AM, keeping the database lean without requiring manual intervention.
What Gets Cleaned vs. Preserved
The skill applies intelligent retention heuristics:
Cleaned (low value):
- Entries from sessions shorter than 2 minutes
- Single-turn exchanges with no follow-up context
- Exact or near-exact duplicates (keep most recent)
- Entries flagged as test/experimental by context
- Entries older than the configured cutoff (default: 90 days)
Preserved (high value):
- Entries explicitly stored with
/rememberor@store - Entries referenced in multiple subsequent conversations
- Technical decisions, preferences, and configuration notes
- Entries with high semantic coherence scores
Practical Tips
- Disable autoCapture first — before running your first clean, switch to explicit-only memory capture; this prevents the database from refilling with noise immediately after cleaning
- Run before major projects — cleaning memory before starting a new project ensures you get fresh, relevant recall without old context interfering
- Use the audit before cleaning — always audit first to understand what's in your database before running a destructive operation; surprises are rare but possible
- Preserve your preferences — if you have explicitly stored preferences and working styles, check the audit output to ensure the clean operation won't remove them
- Reseed after major changes — if your work context has changed significantly (new job, new project, new technical stack), reseed from recent conversations after cleaning the old
Considerations
- Irreversible cleanup — deleted LanceDB entries cannot be recovered; always review the audit report before confirming a clean operation
- Performance after reindex — the vector index rebuild after cleaning takes a few minutes; memory recall may be slower during this window
- Explicit memory still needs curation — disabling autoCapture and using explicit-only memory is better, but you'll still accumulate entries over time; the monthly maintenance workflow handles this
- Not all Clawdbot setups use LanceDB — if you're using an alternative memory backend (PostgreSQL pgvector, etc.), this skill targets LanceDB specifically; check your configuration before assuming it applies
The Bigger Picture
Memory Hygiene addresses a fundamental challenge in long-running AI assistant deployments: context quality degrades over time if not actively managed. A Clawdbot instance running for a year without memory maintenance is like a developer with a notes system that was never reviewed or cleaned — full of outdated information, contradictions, and noise that reduces rather than enhances productivity. With 11,600+ downloads, the community has recognized that managing AI memory is not a one-time setup task but an ongoing maintenance practice — as important as keeping your codebase clean.
View the skill on ClawHub: memory-hygiene