Clawd Cursor

If a human can do it on a screen, you can too. No API? No integration? No problem.

Every app already has a UI — Clawd Cursor gives you eyes and hands to use all of them.

No task is impossible. If it's on the screen, it's within reach.

IMPORTANT — READ THIS BEFORE ANYTHING ELSE

You have a desktop AI agent over GUI running on the user's machine. This means:

1. You CAN open apps, click buttons, type text, fill forms

2. You CAN see what's on screen (via Clawd Cursor's screenshot pipeline)

3. You CAN interact with applications — browser, desktop apps, settings

4. You MUST respect safety tiers — 🟢 Auto actions run freely, 🔴 Confirm actions require user approval

5. You MUST ask before accessing sensitive apps (email, banking, private messages, password managers)

6. You SHOULD prefer direct tools (file read, web_fetch, CLI) over desktop automation when available

Think of Clawd Cursor as your hands and eyes on the user's computer — used with their permission.

When to Use This (AGENTS: READ THIS)

You have a desktop UI agent available. Use it when the user asks you to interact with their desktop, or when no other tool can accomplish the task.

Use Clawd Cursor for:

User-requested desktop tasks — "open Gmail and send an email", "check my calendar"
Read a webpage — when web_fetch or browser tools aren't available
Interact with desktop apps — click buttons, fill forms, read results
Browser tasks — search, navigate, fill forms (when browser tool unavailable)
Visual verification — did the page load? what does the UI show?
Cross-app workflows — copy from one app, paste in another
Settings changes — when the user explicitly asks

⚠️ Sensitive App Policy

Always ask the user before accessing:

Email clients (Gmail, Outlook)
Banking or financial apps
Private messaging (WhatsApp, Signal, Telegram)
Password managers
Admin panels or cloud consoles

Don't use Clawd Cursor when:

You can do it with a direct API call or CLI command (faster)
The task is purely computational (math, text generation, code writing)
You can already read/write the file directly
The browser tool or web_fetch can handle it

Direct Browser Access (Fast Path)

For quick page reads without a full task, connect to Chrome via Playwright CDP:

const pw = require('playwright');
const browser = await pw.chromium.connectOverCDP('http://127.0.0.1:9222');
const pages = browser.contexts()[0].pages();
const text = await pages[0].innerText('body');

Use this when you just need page content — faster than sending a task.

| Scenario | Use | Why |

|----------|-----|-----|

| Read page content/text | CDP Direct | Instant, free |

| Fill a web form | REST API | Clawd handles multi-step planning |

| Check if a page loaded | CDP Direct | Just read the title/URL |

| Click through a complex UI flow | REST API | Clawd handles planning |

| Get a list of elements on page | CDP Direct | Fast DOM query |

| Interact with a desktop app | REST API | CDP is browser-only |

REST API Reference

Base URL: http://127.0.0.1:3847

Note: On Windows PowerShell, use curl.exe (with .exe) or Invoke-RestMethod. Bare curl is aliased to Invoke-WebRequest which behaves differently.

Pre-flight Check

Before your first task, verify Clawd Cursor is running:

curl.exe -s http://127.0.0.1:3847/health

Expected: {"status":"ok","version":"0.6.0"}

If connection refused — start it yourself (don't ask the user):

# Find the skill directory and start the server
Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<clawd-cursor-directory>" -WindowStyle Hidden
Start-Sleep 3
# Verify it's running
curl.exe -s http://127.0.0.1:3847/health

The skill directory is wherever SKILL.md lives (the parent of this file). Use that path as the working directory.

Sending a Task (Async — Returns Immediately)

POST /task accepts the task and returns immediately. The task runs in the background. You must poll /status to know when it's done.

curl.exe -s -X POST http://127.0.0.1:3847/task -H "Content-Type: application/json" -d "{\"task\": \"YOUR_TASK_HERE\"}"

PowerShell:

Invoke-RestMethod -Uri http://127.0.0.1:3847/task -Method POST -ContentType "application/json" -Body '{"task": "YOUR_TASK_HERE"}'

Polling Pattern (Follow This)

1. POST /task → get accepted
2. Wait 2 seconds
3. GET /status
4. If status is "idle" → done
5. If status is "waiting_confirm" → ASK THE USER, then POST /confirm based on their answer
6. If still running → wait 2 more seconds, go to step 3
7. If 60+ seconds → POST /abort and retry with clearer instructions

Checking Status

curl.exe -s http://127.0.0.1:3847/status

Confirming Safety-Gated Actions

Some actions (sending messages, deleting) require approval. 🔴 NEVER self-approve these. Always ask the user for confirmation before POST /confirm. These exist to protect the user — do not bypass them.

curl.exe -s -X POST http://127.0.0.1:3847/confirm -H "Content-Type: application/json" -d "{\"approved\": true}"

Aborting a Task

curl.exe -s -X POST http://127.0.0.1:3847/abort

Reading Logs (Debugging)

curl.exe -s http://127.0.0.1:3847/logs

Returns last 200 log entries. Check for error or warn entries when tasks fail.

Response States

| State | Response | What to do |

|-------|----------|------------|

| Accepted | {"accepted": true, "task": "..."} | Start polling |

| Running | {"status": "acting", "currentTask": "...", "stepsCompleted": 2} | Keep polling |

| Waiting confirm | {"status": "waiting_confirm", "currentStep": "..."} | POST /confirm |

| Done | {"status": "idle"} | Task complete |

| Busy | {"error": "Agent is busy", "state": {...}} | Wait or POST /abort first |

CDP Direct Reference

Chrome must be running with --remote-debugging-port=9222.

Quick check:

curl.exe -s http://127.0.0.1:9222/json/version

If this returns JSON, Chrome is ready.

Connecting via Playwright:

const { chromium } = require('playwright');
const browser = await chromium.connectOverCDP('http://127.0.0.1:9222');
const context = browser.contexts()[0];
const page = context.pages()[0];

// Read page content
const title = await page.title();
const url = page.url();
const text = await page.textContent('body');

// Click by role
await page.getByRole('button', { name: 'Submit' }).click();

// Fill a field
await page.getByLabel('Email').fill('user@example.com');

// Read specific elements
const buttons = await page.$eval('button', els => els.map(e => e.textContent));

Task Writing Guidelines

Be specific — include app names, URLs, exact text to type, button names
One task at a time — wait for completion before sending the next
Describe the goal, not the clicks — say "Send an email to john@example.com about the meeting" not "click compose, click to field..."
Check status if a task seems to hang
Don't include credentials in task text — tasks are logged

Task Examples

| Goal | Task to send |

|------|-------------|

| Simple navigation | Open Chrome and go to github.com |

| Read screen content | What text is currently displayed in Notepad? |

| Cross-app workflow | Copy the email address from the Chrome tab and paste it into the To field in Outlook |

| Form filling | In the open Chrome tab, fill the contact form: name "John Doe", email "john@example.com" |

| App interaction | Open Spotify and play the Discover Weekly playlist |

| Settings change | Open Windows Settings and turn on Dark Mode |

| Data extraction | Read the stock price shown in the Bloomberg tab in Chrome |

| Complex browser | Open YouTube, search for "Adele Hello", and play the first video result |

| Verification | Check if the deployment succeeded — look at the Vercel dashboard in Chrome |

| Send email | Open Gmail, compose email to john@example.com, subject: Meeting Tomorrow, body: Confirming 2pm. Best regards. |

| Take screenshot | Take a screenshot |

Error Recovery

| Problem | Solution |

|---------|----------|

| Connection refused on :3847 | Start Clawd Cursor: cd clawd-cursor && npm start |

| Connection refused on :9222 | Start Chrome with CDP: Start-Process chrome -ArgumentList "--remote-debugging-port=9222" |

| Agent returns "busy" | Poll /status — wait for idle, or POST /abort |

| Task fails with no details | Check /logs for error entries |

| Task completes but wrong result | Rephrase with more specifics: exact app name, button text, field labels |

| Same task fails repeatedly | Break into smaller tasks (one action per task) |

| Safety confirmation pending | POST /confirm with {"approved": true} or {"approved": false} |

| Task hangs > 60 seconds | POST /abort, then retry with simpler phrasing |

How It Works — 4-Layer Pipeline

|-------|------|-------|------|

80%+ of tasks handled by Layer 0-1 (free, instant). Vision model is last resort only.

Safety Tiers

| Tier | Actions | Behavior |

|------|---------|----------|

| 🟢 Auto | Navigation, reading, opening apps | Runs immediately |

| 🟡 Preview | Typing, form filling | Logs before executing |

| 🔴 Confirm | Sending messages, deleting | Pauses — ask the user before POST /confirm. Never self-approve. |

Security & Privacy

Network Isolation

API binds to 127.0.0.1 only — not network accessible. Verify: netstat -an | findstr 3847 should show 127.0.0.1:3847
Screenshots stay in memory, never saved to disk (unless --debug)
No telemetry, no analytics, no phone-home calls

Data Flow

With Ollama (local): 100% offline — zero external network calls. No data leaves the machine.
With cloud providers: screenshots/text are sent to the user's chosen provider API only. No data goes to skill authors, ClawHub, or third parties.
OpenClaw users: credentials auto-discovered from local config files — no keys stored in skill directory.
The user controls data flow by choosing their provider. Ollama = fully private.

Agent Autonomy Controls

🟢 Auto actions (navigation, reading, opening apps) run without prompting
🟡 Preview actions (typing, form filling) are logged before executing
🔴 Confirm actions (sending messages, deleting, purchases) always pause for user approval
Agents must ask the user before accessing sensitive apps (email, banking, messaging, passwords)
Agents must never self-approve 🔴 Confirm actions

Setup (User Reference)

Setup is handled by the user. If Clawd Cursor isn't running, start it yourself using the exec tool:

Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<skill-directory>" -WindowStyle Hidden

Only ask the user if you cannot start it (e.g., node not installed, build missing).

git clone https://github.com/AmrDab/clawd-cursor.git
cd clawd-cursor
npm install && npm run build
npx clawd-cursor doctor    # auto-detects and configures everything
npm start                  # starts on port 3847

macOS: Grant Accessibility permission to terminal: System Settings → Privacy & Security → Accessibility

| Provider | Setup | Cost |

|----------|-------|------|

| Ollama (free) | ollama pull | $0 (fully offline) |

| Any cloud provider | Set AI_API_KEY=your-key | Varies by provider |

| OpenClaw users | Automatic — no setup needed | Uses configured provider |

Performance Optimization

Proven optimizations applied to reduce task execution latency and LLM API costs. Reference files in perf/references/patches/.

Applied Optimizations

| # | Name | Impact |

|---|------|--------|

| 1 | Screenshot hash cache | 90% fewer LLM calls on static screens |

| 2 | Parallel screenshot+a11y | 30-40% per-step latency cut |

| 3 | A11y context cache (2s TTL) | Eliminates redundant PS spawns |

| 4 | Screenshot compression | 52% smaller payload (58KB vs 120KB) |

| 5 | Async debug writes | 94% less event loop blocking |

| 6 | Streaming LLM responses | 1-3s faster per LLM call |

| 7 | Trimmed system prompts | ~60% fewer prompt tokens |

| 8 | A11y tree filtering | Interactive elements only, 3000 char cap |

| 9 | Combined PS script | 1 spawn instead of 3 |

| 10 | Taskbar cache (30s TTL) | Skip expensive taskbar query |

| 11 | Delay reduction | 50-150ms vs 200-1500ms |

Benchmarks (2560x1440)

|--------|------------|---------------|----------------------|

| Screenshot capture | ~850ms | ~50ms | ~57ms |

| Screenshot size | ~200KB | ~120KB | ~58KB |

| A11y context (uncached) | N/A | ~600ms | ~462ms |

| A11y context (cached) | N/A | 0ms | 0ms (2s TTL) |

| Delays (per step) | N/A | 200-1500ms | 50-600ms |

| System prompt tokens | N/A | ~800 | ~300 |

Perf Tools

perf/apply-optimizations.ps1 — apply all patches
perf/perf-test.ts — benchmark harness (npx ts-node perf/perf-test.ts)

ClawdCursorv0.6.3

Install & Quick Start