claw-driveClaw Drive — AI-managed personal drive for OpenClaw. Auto-categorize, tag, deduplicate, and retrieve files with natural language. Backed by Google Drive for...
Install via ClawdBot CLI:
clawdbot install zhiyuanw101/claw-driveOrganize and retrieve personal files with auto-categorization and a searchable index.
File contents are personal data. Treat them accordingly.
identity/ files are ALWAYS sensitive — never read, never extract, never log contents..jsonl files. Once you read a file, its contents are in the logs forever.Data locality: All data stays on your machine. INDEX.jsonl, stored files, and hash ledger are local. Conversation transcripts (.jsonl) are also local to your OpenClaw instance. Nothing is sent to external servers unless you explicitly enable Google Drive sync (optional, and only syncs the files you choose).
brew install dissaozw/tap/claw-drive (or make install from skill directory for manual setup)uv run --with pymupdf — no global install needed)brew install rclonebrew install fswatchALWAYS use the claw-drive CLI. NEVER use cp, mv, or direct file writes to ~/claw-drive/.
The CLI handles copying, hashing, deduplication, and index updates atomically. Bypassing it causes:
PATH note: If installed via Homebrew (brew install dissaozw/tap/claw-drive), the binary is in /opt/homebrew/bin/ and should be in PATH automatically. If installed manually, ~/.local/bin may not be in the agent shell's PATH — use the full path:
claw-drive store ...
If the manual symlink is broken, re-run make install from ~/.openclaw/skills/claw-drive/ to fix it.
claw-drive init [path]
This creates the directory structure, INDEX.jsonl, and hash ledger. Default path: ~/claw-drive.
When receiving a file (email attachment, Telegram upload, etc.):
identity/ are always sensitive — never read contentsuv run --with pymupdf python3 -c "import pymupdf; ..." or use the image tooltree/find) before finalizing destination--. claw-drive not in PATH):
claw-drive store <file> --category <cat> --name 'clean-name.ext' --desc 'Rich description with key details' --tags 'tag1, tag2' --source telegram
--desc/--tags/--name when constructing shell commands. This avoids $ expansion (e.g. currency amounts like $941.39) and prevents metadata corruption. ⚠️ Do NOT use cp or write files directly to ~/claw-drive/. The CLI is the only correct way to store files — it handles copying, hashing, dedup, and index updates atomically.
The CLI handles copying, hashing, deduplication, and index updates automatically. If the file is a duplicate, it will be rejected.
The --name flag lets you override the original filename (which may be ugly like file_17---8c1ee63d-...) with a clean, descriptive name.
Do NOT read INDEX.jsonl directly in the main session. Spawn a search sub-agent instead. This keeps the index out of your context window and scales to large file collections.
The index grows with every stored file (~300 bytes/entry). At 1000+ files, reading the full index into the main agent's context wastes tokens and may hit context limits. A sub-agent runs in its own isolated session with a cheap model, reads the index, and returns only the matching entries.
Use sessions_spawn with:
mode: runmodel: A lightweight model is recommended (the search task is simple). Resolution order: 1. Explicit model param on sessions_spawn (if provided)
2. agents.defaults.subagents.model in config (if set)
3. Falls back to the main agent's model
task: The prompt below, with the user's query filled inYou are a file search agent. Read ~/claw-drive/INDEX.jsonl and find entries matching this query:
"<USER_QUERY>"
Return ONLY valid JSON, no explanation:
{
"matches": [
{
"path": "<path from index>",
"desc": "<desc from index>",
"date": "<date from index>",
"tags": ["<tags from index>"],
"confidence": "high|medium|low"
}
],
"total_indexed": <number of entries in index>,
"query": "<original query>"
}
Rules:
- Max 5 matches, sorted by relevance
- confidence: high = exact match, medium = likely relevant, low = tangential
- If no matches, return {"matches": [], "total_indexed": N, "query": "..."}
- Only read INDEX.jsonl, never read file contents
~/claw-drive/ to each path to get the full file path
cp ~/claw-drive/<path> ~/.openclaw/workspace/
# send via message tool
rm ~/.openclaw/workspace/<filename>
If sessions_spawn returns pairing required, the sub-agent's exec harness needs device pairing approval. Run:
openclaw devices list # find the pending request
openclaw devices approve <request-id>
This is a one-time setup — once approved, subsequent spawns work without re-pairing.
INDEX.jsonl is a JSONL file — one JSON object per line. Each entry has: date, path, desc, tags (array), source, and optional fields metadata (JSON), original_name, correspondent.
claw-drive update <path> --desc "new description" --tags "new, tags"
Both --desc and --tags are optional (at least one required). Uses jq for atomic rewrite.
claw-drive delete <path> --force
Without --force, shows what would be deleted (dry run). With --force, removes file + index entry + dedup hash.
Tags add cross-category searchability. A file lives in one folder but can have multiple tags.
Guidelines:
medical for files in medical/)my-cat), document type (invoice, receipt, report), context (emergency, tax-2025)Examples:
# Insurance PDF — after extracting: policy number, vehicle, VIN, dates, agent
claw-drive store file.pdf -c insurance -n "acme-auto-id-cards.pdf" \
-d "Acme Insurance ID cards - 2024 Honda Civic, VIN 1HGBH41JXMN109186, Policy ****3441, effective 1/21/2026–7/21/2026, agent Jane Smith (555) 123-4567" \
-t "insurance, auto, acme, id-card, honda-civic, california" -s telegram
# Vet invoice — after extracting: clinic, amount, diagnosis, pet name
claw-drive store invoice.pdf -c medical -n "my-cat-vet-invoice-2026-02-15.pdf" \
-d "VEG emergency visit invoice - Max (cat), $1,234.56, bronchial pattern diagnosis, prednisolone prescribed" \
-t "medical, invoice, max, emergency, vet" -s email
# W-2 — after extracting: employer, tax year, wages
claw-drive store w2.pdf -c finance -n "w2-2025.pdf" \
-d "W-2 tax form 2025 - Employer: Acme Corp, wages $120,000" \
-t "finance, tax-2025, w2" -s email
# Sensitive file — user said "keep it private" or didn't reply
claw-drive store scan.pdf -c identity -n "passport-scan-2026.pdf" \
-d "Passport scan" \
-t "identity, passport" -s telegram
# Sensitive file — user provided brief description
claw-drive store doc.pdf -c contracts -n "apartment-lease-2026.pdf" \
-d "Apartment lease agreement, signed Jan 2026" \
-t "contracts, lease, housing" -s email
my-cat-vet-invoice-2026-02-15.pdfCategories are not fixed — the agent can create any category that makes sense. The CLI does mkdir -p automatically. These are the defaults created by init, but use whatever fits:
| Category | Use for |
|----------|---------|
| documents | General docs, letters, forms, manuals |
| finance | Tax returns, bank statements, investment docs, pay stubs |
| insurance | Insurance policies, claims, coverage documents |
| medical | Health records, lab results, prescriptions, pet health |
| travel | Boarding passes, itineraries, hotel bookings, visas |
| identity | Passport scans, birth certs, SSN docs (⚠️ sensitive) |
| receipts | Purchase receipts, warranties, service invoices |
| contracts | Leases, employment agreements, legal docs |
| photos | Personal photos, document scans |
| misc | Anything that doesn't fit above |
Need housing/, work/, pets/? Just use it — the directory is created on first store.
When in doubt: misc/ is fine. Better to store it somewhere than not at all.
Bulk-import files from an existing directory:
# 1. Scan source directory into a plan
claw-drive migrate scan ~/messy-folder plan.json
# 2. Agent classifies each file (fills in category, name, tags, description in the JSON)
# 3. Review
claw-drive migrate summary plan.json
# 4. Dry run
claw-drive migrate apply plan.json --dry-run
# 5. Execute
claw-drive migrate apply plan.json
The plan JSON contains one entry per file with category, name, tags, description fields (initially null). The agent fills these in using the same extract-first approach, then apply copies files with full dedup and indexing.
Claw Drive can auto-sync to Google Drive (or any rclone-supported backend) via a background daemon.
brew install rclone fswatch
Run claw-drive sync auth. It opens a browser on the machine for Google sign-in.
What happens:
~/.config/rclone/rclone.conf — never sent to any third partyAgent behavior during auth:
claw-drive sync auth in backgroundclaw-drive sync setup # verify deps and config
claw-drive sync start # start background daemon (fswatch + rclone)
claw-drive sync stop # stop daemon
claw-drive sync push # manual one-shot sync
claw-drive sync status # show sync status
The daemon watches the drive directory for file changes and syncs to the remote within seconds. It runs as a launchd service — starts on login, restarts on failure.
Logs: ~/Library/Logs/claw-drive/sync.log
Use the exclude list in .sync-config to keep sensitive directories local-only. identity/ is excluded by default.
Check index ↔ disk ↔ hash consistency:
claw-drive verify # report issues
claw-drive verify --fix # auto-repair what's fixable
Auto-fixable: missing on disk (removes stale index entry), missing hash (re-registers).
Manual review: orphan files (no metadata to index), hash mismatches (possible corruption).
Run verify after manual file operations or when something seems off.
uv run --with pymupdf python3 -c "import pymupdf; ..."claw-drive status to see file counts, size, and sync statusBefore storing any file, verify:
identity/: am I skipping extraction regardless? (must be yes)Generated Mar 1, 2026
Freelancers receive client contracts, invoices, and project files through various channels. Claw Drive automatically categorizes these documents by client and project type, extracts key details like dates and amounts, and makes them searchable by natural language queries like 'Q3 invoices for Acme Corp'.
Individuals receive bank statements, tax documents, insurance policies, and receipts via email attachments. Claw Drive categorizes these into financial categories, extracts account numbers and dates, and creates searchable descriptions while maintaining privacy for sensitive financial data through user consent workflows.
Researchers collect PDF articles, data files, and reference materials from various sources. Claw Drive extracts text from PDFs, categorizes by research topic or project, tags with author names and publication years, and enables retrieval through queries like 'climate change studies from 2023'.
Business owners receive regulatory documents, licenses, permits, and compliance certificates that need organized storage. Claw Drive categorizes by document type and agency, extracts expiration dates and reference numbers, and ensures files are findable when audit time approaches.
Families accumulate important documents like birth certificates, medical records, school reports, and property deeds. Claw Drive organizes these into identity, medical, and education categories with strict privacy controls for sensitive documents, making them retrievable without exposing private information.
Offer basic file organization and search capabilities for free, with premium features like advanced AI categorization, Google Drive sync, and team collaboration tools as paid subscriptions. Revenue comes from monthly subscriptions for power users and small teams.
Sell customized versions to businesses needing secure document management with compliance features. Include advanced security controls, audit logging, and integration with existing enterprise systems like SharePoint or Salesforce through API access.
Offer the AI categorization engine as an API service that other applications can integrate. Developers pay based on API calls for file processing, text extraction, and intelligent tagging while maintaining their own storage solutions.
💬 Integration Tip
Always use the claw-drive CLI for file operations, never direct file system commands, to maintain index consistency and proper deduplication.
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Search and analyze your own session logs (older/parent conversations) using jq.
Typed knowledge graph for structured agent memory and composable skills. Use when creating/querying entities (Person, Project, Task, Event, Document), linking related objects, enforcing constraints, planning multi-step actions as graph transformations, or when skills need to share state. Trigger on "remember", "what do I know about", "link X to Y", "show dependencies", entity CRUD, or cross-skill data access.
Ultimate AI agent memory system for Cursor, Claude, ChatGPT & Copilot. WAL protocol + vector search + git-notes + cloud backup. Never lose context again. Vibe-coding ready.
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection