skill-spotlightopenclawmemory-hygieneclawhubopenclawlancedbvector-memory

Memory Hygiene: Keep Clawdbot's LanceDB Vector Memory Clean and Fast

March 16, 2026·6 min read

11,600+ downloads — the Memory Hygiene Skill is the maintenance tool for Clawdbot's long-term memory system. Clawdbot uses LanceDB — an embedded vector database — to store conversation history and contextual embeddings for auto-recall. Over time, this database accumulates stale, redundant, and low-quality entries that degrade recall performance and bloat storage. Memory Hygiene audits, prunes, and reseeds this database to keep memory working well.

The Problem It Solves

Clawdbot's memory system is designed to help you: it captures important context from conversations and resurfaces it when relevant. But like any database that only ever grows, LanceDB will eventually:

Accumulate noise — low-relevance fragments from old conversations that match queries and crowd out useful results
Grow in size — unbounded growth slows down vector similarity searches
Contain contradictions — if you've changed how you work or what you prefer, old entries can override newer, more accurate ones
Drift from reality — information you stored six months ago about "current projects" is now outdated

Memory Hygiene solves this by giving you visibility into what's in LanceDB and the tools to clean it up.

Understanding Clawdbot's LanceDB Memory

Before running Memory Hygiene, it helps to understand the storage architecture:

~/.clawdbot/
  memory/
    lancedb/               # Vector database files
      conversation_history/  # Raw conversation turns
      embeddings/          # Semantic embedding store
      index/               # Vector similarity index

Each stored memory is a vector embedding — a numerical representation of text that captures semantic meaning. When Clawdbot needs to recall relevant context, it performs a vector similarity search against this database.

The quality of recall depends on:

Index freshness — outdated entries reduce signal-to-noise ratio
Deduplication — near-duplicate entries waste space and dilute results
Relevance scoring — entries from deeply contextual conversations are more valuable than throwaway interactions

Core Operations

Memory Audit

Audit my Clawdbot memory database

The skill analyzes the current LanceDB state and reports:

Total number of stored embeddings
Age distribution (entries from last week, month, 3 months, 1 year+)
Estimated storage size
Duplicate or near-duplicate entry count
Low-confidence entries (stored from weak signals)

Full Wipe and Reseed

For severe cases where the database is dominated by noise, a full wipe is faster than incremental cleanup:

Wipe my LanceDB memory and reseed from my MEMORY.md

The skill:

Deletes ~/.clawdbot/memory/lancedb/
Restarts the Gateway daemon to clear in-memory state
Re-reads your curated MEMORY.md file
Reseeds LanceDB with only the facts you've explicitly preserved

This "wipe and reseed" approach treats your handcrafted MEMORY.md as the source of truth — a fundamentally different mental model from incremental cleanup. It's the right choice when auto-capture has filled the DB with transient noise (heartbeat entries, test conversations, etc.).

Clean Stale Entries

Clean up old memory entries I no longer need

Remove memory entries older than 3 months

The skill removes entries below a relevance or age threshold. You can configure:

Age cutoff (default: entries older than 90 days)
Relevance score threshold
Specific conversation sessions to purge

Deduplicate

Remove duplicate memory entries

Near-duplicate embeddings (vector similarity > 0.95) are merged or pruned, keeping the most recent or highest-confidence version.

Reseed High-Value Memories

Reseed my memory with the most important context from recent conversations

After cleaning, the skill can re-index recent conversations with fresh embeddings, ensuring the highest-quality recent context is well-represented in the database.

Disable autoCapture

A key configuration option that Memory Hygiene helps manage:

Disable autoCapture for my memory system

autoCapture is Clawdbot's setting that automatically stores conversation fragments to LanceDB. When enabled indiscriminately, it captures everything — including low-value exchanges that add noise. Memory Hygiene recommends disabling autoCapture and instead using explicit memory commands (/remember this, @store) to ensure only high-value context is stored.

// Recommended configuration (set in clawdbot config):
{
  "memory": {
    "autoCapture": false,
    "explicitOnly": true
  }
}

The Monthly Maintenance Workflow

Memory Hygiene is designed to be run as a monthly cron job:

Set up a monthly memory hygiene cron job

# Manual equivalent:
clawdbot cron add \
  --name "Monthly Memory Hygiene" \
  --cron "0 3 1 * *" \
  --session isolated \
  --message "Run memory hygiene: audit, remove entries older than 60 days, deduplicate, and reseed from last month's high-value conversations. Report what was cleaned."

This runs on the 1st of each month at 3 AM, keeping the database lean without requiring manual intervention.

What Gets Cleaned vs. Preserved

The skill applies intelligent retention heuristics:

Cleaned (low value):

Entries from sessions shorter than 2 minutes
Single-turn exchanges with no follow-up context
Exact or near-exact duplicates (keep most recent)
Entries flagged as test/experimental by context
Entries older than the configured cutoff (default: 90 days)

Preserved (high value):

Entries explicitly stored with /remember or @store
Entries referenced in multiple subsequent conversations
Technical decisions, preferences, and configuration notes
Entries with high semantic coherence scores

Practical Tips

Disable autoCapture first — before running your first clean, switch to explicit-only memory capture; this prevents the database from refilling with noise immediately after cleaning
Run before major projects — cleaning memory before starting a new project ensures you get fresh, relevant recall without old context interfering
Use the audit before cleaning — always audit first to understand what's in your database before running a destructive operation; surprises are rare but possible
Preserve your preferences — if you have explicitly stored preferences and working styles, check the audit output to ensure the clean operation won't remove them
Reseed after major changes — if your work context has changed significantly (new job, new project, new technical stack), reseed from recent conversations after cleaning the old

Considerations

Irreversible cleanup — deleted LanceDB entries cannot be recovered; always review the audit report before confirming a clean operation
Performance after reindex — the vector index rebuild after cleaning takes a few minutes; memory recall may be slower during this window
Explicit memory still needs curation — disabling autoCapture and using explicit-only memory is better, but you'll still accumulate entries over time; the monthly maintenance workflow handles this
Not all Clawdbot setups use LanceDB — if you're using an alternative memory backend (PostgreSQL pgvector, etc.), this skill targets LanceDB specifically; check your configuration before assuming it applies

The Bigger Picture

Memory Hygiene addresses a fundamental challenge in long-running AI assistant deployments: context quality degrades over time if not actively managed. A Clawdbot instance running for a year without memory maintenance is like a developer with a notes system that was never reviewed or cleaned — full of outdated information, contradictions, and noise that reduces rather than enhances productivity. With 11,600+ downloads, the community has recognized that managing AI memory is not a one-time setup task but an ongoing maintenance practice — as important as keeping your codebase clean.

View the skill on ClawHub: memory-hygiene

← Back to Blog