clawd-cursorAI desktop agent — control any app on Windows/macOS from your OpenClaw agent. Send natural language tasks to the Clawd Cursor API and it handles everything:...
Install via ClawdBot CLI:
clawdbot install AmrDab/clawd-cursorIf a human can do it on a screen, you can too. No API? No integration? No problem.
Every app already has a UI — Clawd Cursor gives you eyes and hands to use all of them.
No task is impossible. If it's on the screen, it's within reach.
IMPORTANT — READ THIS BEFORE ANYTHING ELSE
>
You have a desktop AI agent over GUI running on the user's machine. This means:
1. You CAN open apps, click buttons, type text, fill forms
2. You CAN see what's on screen (via Clawd Cursor's screenshot pipeline)
3. You CAN interact with applications — browser, desktop apps, settings
4. You MUST respect safety tiers — 🟢 Auto actions run freely, 🔴 Confirm actions require user approval
5. You MUST ask before accessing sensitive apps (email, banking, private messages, password managers)
6. You SHOULD prefer direct tools (file read, web_fetch, CLI) over desktop automation when available
>
Think of Clawd Cursor as your hands and eyes on the user's computer — used with their permission.
You have a desktop UI agent available. Use it when the user asks you to interact with their desktop, or when no other tool can accomplish the task.
Always ask the user before accessing:
For quick page reads without a full task, connect to Chrome via Playwright CDP:
const pw = require('playwright');
const browser = await pw.chromium.connectOverCDP('http://127.0.0.1:9222');
const pages = browser.contexts()[0].pages();
const text = await pages[0].innerText('body');
Use this when you just need page content — faster than sending a task.
| Scenario | Use | Why |
|----------|-----|-----|
| Read page content/text | CDP Direct | Instant, free |
| Fill a web form | REST API | Clawd handles multi-step planning |
| Check if a page loaded | CDP Direct | Just read the title/URL |
| Click through a complex UI flow | REST API | Clawd handles planning |
| Get a list of elements on page | CDP Direct | Fast DOM query |
| Interact with a desktop app | REST API | CDP is browser-only |
Base URL: http://127.0.0.1:3847
Note: On Windows PowerShell, usecurl.exe(with .exe) orInvoke-RestMethod. Barecurlis aliased toInvoke-WebRequestwhich behaves differently.
Before your first task, verify Clawd Cursor is running:
curl.exe -s http://127.0.0.1:3847/health
Expected: {"status":"ok","version":"0.6.0"}
If connection refused — start it yourself (don't ask the user):
# Find the skill directory and start the server
Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<clawd-cursor-directory>" -WindowStyle Hidden
Start-Sleep 3
# Verify it's running
curl.exe -s http://127.0.0.1:3847/health
The skill directory is wherever SKILL.md lives (the parent of this file). Use that path as the working directory.
POST /task accepts the task and returns immediately. The task runs in the background. You must poll /status to know when it's done.
curl.exe -s -X POST http://127.0.0.1:3847/task -H "Content-Type: application/json" -d "{\"task\": \"YOUR_TASK_HERE\"}"
PowerShell:
Invoke-RestMethod -Uri http://127.0.0.1:3847/task -Method POST -ContentType "application/json" -Body '{"task": "YOUR_TASK_HERE"}'
1. POST /task → get accepted
2. Wait 2 seconds
3. GET /status
4. If status is "idle" → done
5. If status is "waiting_confirm" → ASK THE USER, then POST /confirm based on their answer
6. If still running → wait 2 more seconds, go to step 3
7. If 60+ seconds → POST /abort and retry with clearer instructions
curl.exe -s http://127.0.0.1:3847/status
Some actions (sending messages, deleting) require approval. 🔴 NEVER self-approve these. Always ask the user for confirmation before POST /confirm. These exist to protect the user — do not bypass them.
curl.exe -s -X POST http://127.0.0.1:3847/confirm -H "Content-Type: application/json" -d "{\"approved\": true}"
curl.exe -s -X POST http://127.0.0.1:3847/abort
curl.exe -s http://127.0.0.1:3847/logs
Returns last 200 log entries. Check for error or warn entries when tasks fail.
| State | Response | What to do |
|-------|----------|------------|
| Accepted | {"accepted": true, "task": "..."} | Start polling |
| Running | {"status": "acting", "currentTask": "...", "stepsCompleted": 2} | Keep polling |
| Waiting confirm | {"status": "waiting_confirm", "currentStep": "..."} | POST /confirm |
| Done | {"status": "idle"} | Task complete |
| Busy | {"error": "Agent is busy", "state": {...}} | Wait or POST /abort first |
Chrome must be running with --remote-debugging-port=9222.
curl.exe -s http://127.0.0.1:9222/json/version
If this returns JSON, Chrome is ready.
const { chromium } = require('playwright');
const browser = await chromium.connectOverCDP('http://127.0.0.1:9222');
const context = browser.contexts()[0];
const page = context.pages()[0];
// Read page content
const title = await page.title();
const url = page.url();
const text = await page.textContent('body');
// Click by role
await page.getByRole('button', { name: 'Submit' }).click();
// Fill a field
await page.getByLabel('Email').fill('user@example.com');
// Read specific elements
const buttons = await page.$eval('button', els => els.map(e => e.textContent));
| Goal | Task to send |
|------|-------------|
| Simple navigation | Open Chrome and go to github.com |
| Read screen content | What text is currently displayed in Notepad? |
| Cross-app workflow | Copy the email address from the Chrome tab and paste it into the To field in Outlook |
| Form filling | In the open Chrome tab, fill the contact form: name "John Doe", email "john@example.com" |
| App interaction | Open Spotify and play the Discover Weekly playlist |
| Settings change | Open Windows Settings and turn on Dark Mode |
| Data extraction | Read the stock price shown in the Bloomberg tab in Chrome |
| Complex browser | Open YouTube, search for "Adele Hello", and play the first video result |
| Verification | Check if the deployment succeeded — look at the Vercel dashboard in Chrome |
| Send email | Open Gmail, compose email to john@example.com, subject: Meeting Tomorrow, body: Confirming 2pm. Best regards. |
| Take screenshot | Take a screenshot |
| Problem | Solution |
|---------|----------|
| Connection refused on :3847 | Start Clawd Cursor: cd clawd-cursor && npm start |
| Connection refused on :9222 | Start Chrome with CDP: Start-Process chrome -ArgumentList "--remote-debugging-port=9222" |
| Agent returns "busy" | Poll /status — wait for idle, or POST /abort |
| Task fails with no details | Check /logs for error entries |
| Task completes but wrong result | Rephrase with more specifics: exact app name, button text, field labels |
| Same task fails repeatedly | Break into smaller tasks (one action per task) |
| Safety confirmation pending | POST /confirm with {"approved": true} or {"approved": false} |
| Task hangs > 60 seconds | POST /abort, then retry with simpler phrasing |
| Layer | What | Speed | Cost |
|-------|------|-------|------|
| 0: Browser Layer | URL detection → direct navigation | Instant | Free |
| 1: Action Router | Regex + UI Automation | Instant | Free |
| 1.5: Smart Interaction | 1 LLM plan → CDP/UIDriver executes | ~2-5s | 1 LLM call |
| 2: Accessibility Reasoner | UI tree → text LLM decides | ~1s | Cheap |
| 3: Computer Use | Screenshot → vision LLM | ~5-8s | Expensive |
80%+ of tasks handled by Layer 0-1 (free, instant). Vision model is last resort only.
| Tier | Actions | Behavior |
|------|---------|----------|
| 🟢 Auto | Navigation, reading, opening apps | Runs immediately |
| 🟡 Preview | Typing, form filling | Logs before executing |
| 🔴 Confirm | Sending messages, deleting | Pauses — ask the user before POST /confirm. Never self-approve. |
127.0.0.1 only — not network accessible. Verify: netstat -an | findstr 3847 should show 127.0.0.1:3847--debug)Setup is handled by the user. If Clawd Cursor isn't running, start it yourself using the exec tool:
Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "<skill-directory>" -WindowStyle Hidden
Only ask the user if you cannot start it (e.g., node not installed, build missing).
git clone https://github.com/AmrDab/clawd-cursor.git
cd clawd-cursor
npm install && npm run build
npx clawd-cursor doctor # auto-detects and configures everything
npm start # starts on port 3847
macOS: Grant Accessibility permission to terminal: System Settings → Privacy & Security → Accessibility
| Provider | Setup | Cost |
|----------|-------|------|
| Ollama (free) | ollama pull | $0 (fully offline) |
| Any cloud provider | Set AI_API_KEY=your-key | Varies by provider |
| OpenClaw users | Automatic — no setup needed | Uses configured provider |
Proven optimizations applied to reduce task execution latency and LLM API costs. Reference files in perf/references/patches/.
| # | Name | Impact |
|---|------|--------|
| 1 | Screenshot hash cache | 90% fewer LLM calls on static screens |
| 2 | Parallel screenshot+a11y | 30-40% per-step latency cut |
| 3 | A11y context cache (2s TTL) | Eliminates redundant PS spawns |
| 4 | Screenshot compression | 52% smaller payload (58KB vs 120KB) |
| 5 | Async debug writes | 94% less event loop blocking |
| 6 | Streaming LLM responses | 1-3s faster per LLM call |
| 7 | Trimmed system prompts | ~60% fewer prompt tokens |
| 8 | A11y tree filtering | Interactive elements only, 3000 char cap |
| 9 | Combined PS script | 1 spawn instead of 3 |
| 10 | Taskbar cache (30s TTL) | Skip expensive taskbar query |
| 11 | Delay reduction | 50-150ms vs 200-1500ms |
| Metric | v0.3 (VNC) | v0.4 (Native) | v0.4.1+ (Optimized) |
|--------|------------|---------------|----------------------|
| Screenshot capture | ~850ms | ~50ms | ~57ms |
| Screenshot size | ~200KB | ~120KB | ~58KB |
| A11y context (uncached) | N/A | ~600ms | ~462ms |
| A11y context (cached) | N/A | 0ms | 0ms (2s TTL) |
| Delays (per step) | N/A | 200-1500ms | 50-600ms |
| System prompt tokens | N/A | ~800 | ~300 |
perf/apply-optimizations.ps1 — apply all patchesperf/perf-test.ts — benchmark harness (npx ts-node perf/perf-test.ts)Generated Mar 1, 2026
An AI agent uses Clawd Cursor to open a helpdesk application like Zendesk or Freshdesk, read incoming support tickets, categorize them based on content, and draft initial responses or escalate to human agents. It can also update ticket statuses and log actions in a spreadsheet for reporting.
In a small business setting, the agent automates monthly financial tasks by opening accounting software like QuickBooks, extracting transaction data, generating reports in Excel, and uploading them to a cloud storage service or emailing them to stakeholders, ensuring timely compliance.
The agent interacts with a clinic's desktop scheduling system to book patient appointments, send confirmation emails via an email client, and update a calendar app with reminders. It can also handle rescheduling requests by reading incoming emails and modifying appointments accordingly.
For an online store, the agent automates order fulfillment by opening an e-commerce platform like Shopify, processing new orders, printing shipping labels, and updating inventory levels in a separate database app. It can also handle customer inquiries by reading messages and generating replies.
In an educational institution, the agent assists teachers by opening learning management systems like Canvas, downloading assignment submissions, grading them based on rubrics, and posting feedback to students. It can also curate online resources by browsing the web and saving links to a shared drive.
Offer a monthly subscription where businesses pay for access to AI agents that automate repetitive desktop tasks like data entry, report generation, and app interactions. Revenue comes from tiered plans based on usage levels and support features, targeting SMEs seeking efficiency gains.
Provide tailored solutions by developing specific automation workflows for enterprises using Clawd Cursor, including integration with legacy systems and training for staff. Revenue is generated through one-time project fees and ongoing maintenance contracts, focusing on industries with high manual overhead.
Release a free version of Clawd Cursor for basic automation tasks, with premium upgrades offering advanced features like multi-app orchestration, analytics dashboards, and priority support. Revenue streams include in-app purchases for additional capabilities and enterprise licenses for large-scale deployments.
💬 Integration Tip
Ensure Clawd Cursor is running locally before sending tasks, and use the direct browser access via Playwright CDP for faster page reads to optimize performance and reduce latency in automation workflows.
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Search and analyze your own session logs (older/parent conversations) using jq.
Typed knowledge graph for structured agent memory and composable skills. Use when creating/querying entities (Person, Project, Task, Event, Document), linking related objects, enforcing constraints, planning multi-step actions as graph transformations, or when skills need to share state. Trigger on "remember", "what do I know about", "link X to Y", "show dependencies", entity CRUD, or cross-skill data access.
Ultimate AI agent memory system for Cursor, Claude, ChatGPT & Copilot. WAL protocol + vector search + git-notes + cloud backup. Never lose context again. Vibe-coding ready.
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection