Desktop Control: Give Claude Eyes and Hands on Your Screen
With over 20,000 downloads and 149 stars, desktop-control by @matagul is one of ClawHub's most starred automation skills. It gives Claude complete control over your desktop: mouse movement, clicking, keyboard input, screenshots, pixel detection, image matching, window management, and clipboard — all locally, with no external service required.
The Problem It Solves
Most AI agent integrations work through APIs: call a web service, get structured data back. But most of the software people actually use every day — legacy enterprise tools, local desktop apps, games, creative software — has no API. The only interface it exposes is what you see on screen.
RPA (Robotic Process Automation) tools like UiPath and Automation Anywhere exist to solve this, but they're enterprise products with enterprise pricing and enterprise setup complexity. For a developer who needs Claude to automate a form submission in a desktop app, or a researcher who wants to extract data from a GUI-only tool, that overhead is prohibitive.
desktop-control solves this with a single Python class — built on pyautogui, pygetwindow, and pyperclip — that Claude can call to control any application visible on screen.
Architecture
The skill is built around the DesktopController class, which wraps PyAutoGUI with cleaner ergonomics, a built-in failsafe, and an optional approval mode. All operations are local — no cloud, no API key, no data leaving your machine.
from desktop_control import DesktopController
dc = DesktopController(failsafe=True) # failsafe enabled by defaultThe failsafe=True setting is PyAutoGUI's standard emergency stop: if the mouse reaches any screen corner, execution aborts immediately.
Mouse Control
# Move to absolute coordinates (smooth easing by default)
dc.move_mouse(960, 540, duration=1.0)
# Move relative to current position
dc.move_relative(100, 0, duration=0.5)
# Single, double, and right-click
dc.click(500, 300)
dc.double_click(500, 300)
dc.right_click(500, 300)
# Drag and drop
dc.drag(100, 100, 400, 400, duration=0.5)
# Scroll (positive = up, negative = down)
dc.scroll(3) # scroll up 3 ticks
dc.scroll(-5, x=960, y=540) # scroll down at position
# Get current cursor position
x, y = dc.get_mouse_position()Smooth movement uses cubic easing (easeInOutQuad) when duration > 0, which looks and behaves like human mouse movement and is less likely to trigger anti-automation detection.
Keyboard Control
# Type text at natural human speed (60 WPM)
dc.type_text("Hello from OpenClaw!", wpm=60)
# Type instantly
dc.type_text("instant input", interval=0)
# Press named keys
dc.press('enter')
dc.press('space', presses=3)
dc.press('f1')
# Keyboard shortcuts
dc.hotkey('ctrl', 'c') # Copy
dc.hotkey('ctrl', 'shift', 's') # Save As
dc.hotkey('win', 'r') # Windows Run dialog
dc.hotkey('cmd', 'tab') # macOS app switcher
# Hold and release (for complex combos)
dc.key_down('shift')
dc.click(800, 400) # shift+click
dc.key_up('shift')The wpm parameter converts words-per-minute to per-keystroke delay — useful for applications that break on paste-speed input.
Screen Operations
Screenshots
# Full screenshot, saved to file
dc.screenshot(filename="capture.png")
# Region screenshot (left, top, width, height)
dc.screenshot(region=(100, 100, 800, 600), filename="region.png")
# Return PIL Image object without saving
img = dc.screenshot()Pixel Color Detection
r, g, b = dc.get_pixel_color(960, 540)Useful for detecting UI state — whether a button is active (checking its color), whether a loading spinner is gone, or whether a field is highlighted.
Image Template Matching
# Find an image on screen (requires OpenCV)
location = dc.find_on_screen("button.png", confidence=0.8)
if location:
center_x = location.left + location.width // 2
center_y = location.top + location.height // 2
dc.click(center_x, center_y)Template matching lets Claude navigate UIs without knowing coordinates in advance — find a button by what it looks like, not where it is.
Window Management
# List all open windows
windows = dc.get_all_windows()
# Get active window title
title = dc.get_active_window()
# Bring a window to front by partial title match
dc.activate_window("Notepad")
dc.activate_window("Chrome")Clipboard
# Write to clipboard
dc.copy_to_clipboard("text to paste")
# Read from clipboard
text = dc.get_from_clipboard()Combining clipboard with hotkeys enables fast data extraction: Ctrl+A → Ctrl+C → get_from_clipboard() — select all, copy, read.
Safety Features
Failsafe (Always On by Default)
dc = DesktopController(failsafe=True)Move your mouse to any corner of the screen and execution stops immediately. This is the emergency brake — use it if automation goes wrong.
Approval Mode
dc = DesktopController(require_approval=True)Every action prompts the user for confirmation before executing. Useful during development to step through automation one action at a time.
Programmatic Safety Check
if dc.is_safe():
dc.click(500, 300)is_safe() checks whether the mouse is in a corner — programmatic equivalent of the failsafe.
Practical Automation Example
Opening a dialog, filling a form, and submitting:
dc = DesktopController(failsafe=True)
# Open Windows Run dialog
dc.hotkey('win', 'r')
dc.pause(0.5)
# Type and execute
dc.type_text('notepad', wpm=80)
dc.press('enter')
dc.pause(1.0)
# Type content in Notepad
dc.type_text("Automated by OpenClaw\\n", wpm=60)
dc.hotkey('ctrl', 's')desktop-control vs. Alternatives
| Feature | desktop-control | Browser automation (Playwright) | RPA tools (UiPath) |
|---|---|---|---|
| Any desktop app | ✅ | ❌ (browser only) | ✅ |
| No API required | ✅ | ✅ | ✅ |
| Image template matching | ✅ | ❌ | ✅ |
| Local/private | ✅ | ✅ | ⚠️ cloud options |
| Setup complexity | low | medium | high |
| Cost | free | free | $$$ |
| Failsafe abort | ✅ built in | ❌ | varies |
How to Install
clawdbot install desktop-controlThe skill requires Python and several packages:
pip install pyautogui pygetwindow pyperclip
# For image template matching:
pip install opencv-pythonOn macOS, pygetwindow has limited support — window activation may not work for all apps. On Linux, you'll need python3-xlib or scrot for screenshots depending on your display server.
Practical Tips
-
Always keep failsafe enabled. Move your mouse to any screen corner to abort a runaway automation. There's almost never a reason to set
failsafe=False. -
Use
durationon mouse moves for reliability. Instant moves (duration=0) can miss targets on high-DPI displays or in fast-rendering UIs. Evenduration=0.1improves reliability significantly. -
Add pauses after opening dialogs or applications.
dc.pause(0.5)after a hotkey or app launch gives the UI time to render before the next action. Skipping pauses is the most common cause of failed automation. -
Use template matching for dynamic UIs. If button positions change between sessions (resizable windows, dynamic layouts), use
find_on_screen()with a screenshot of the button rather than hardcoded coordinates. -
Combine clipboard with Ctrl+A for extraction. To extract all text from a text field or document:
dc.hotkey('ctrl', 'a')→dc.hotkey('ctrl', 'c')→text = dc.get_from_clipboard(). Faster and more reliable than reading pixel by pixel. -
Use
require_approval=Trueduring development. Step through your automation confirming each action before it runs. Once the sequence is verified, switch torequire_approval=Falsefor automated runs.
Considerations
- Platform differences: Mouse coordinate systems and window management APIs differ between Windows, macOS, and Linux. Test on your target OS — don't assume cross-platform behavior.
- Detection risk: Some applications detect and block automated input (anti-cheat systems, CAPTCHAs, banking apps). The skill doesn't include anti-detection; use it where automation is permitted.
- No visual understanding by default: The skill can take screenshots and return pixel data, but Claude needs to interpret visual content (OCR, object detection) separately. For reading text from screenshots, use Claude's vision capability alongside the screenshots
desktop-controlcaptures. - Coordinate brittleness: Hardcoded coordinates break when screen resolution, scaling, or window size changes. Template matching or window-relative positioning is more robust.
- pygetwindow limitations: Window management is most reliable on Windows. macOS and Linux support is partial and may require additional setup.
The Bigger Picture
desktop-control represents a fundamental expansion of what AI agents can act on. Web APIs, databases, and file systems are increasingly well-covered. But the vast majority of software that businesses actually run — legacy CRMs, ERP systems, specialized desktop tools — has no API. The only way to automate them is the same way a human does: look at the screen and control the mouse and keyboard.
With 149 stars, desktop-control is clearly filling a gap that other integration approaches leave untouched. It's not a replacement for proper API integration where one exists — but where it doesn't, this is the tool that makes automation possible.
View the skill on ClawHub: desktop-control