windows-controlFull Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.
Install via ClawdBot CLI:
clawdbot install Spliff7777/windows-controlFull desktop automation for Windows. Control mouse, keyboard, and screen like a human user.
All scripts are in skills/windows-control/scripts/
py screenshot.py > output.b64
Returns base64 PNG of entire screen.
py click.py 500 300 # Left click at (500, 300)
py click.py 500 300 right # Right click
py click.py 500 300 left 2 # Double click
py type_text.py "Hello World"
Types text at current cursor position (10ms between keys).
py key_press.py "enter"
py key_press.py "ctrl+s"
py key_press.py "alt+tab"
py key_press.py "ctrl+shift+esc"
py mouse_move.py 500 300
Moves mouse to coordinates (smooth 0.2s animation).
py scroll.py up 5 # Scroll up 5 notches
py scroll.py down 10 # Scroll down 10 notches
py focus_window.py "Chrome" # Bring window to front
py minimize_window.py "Notepad" # Minimize window
py maximize_window.py "VS Code" # Maximize window
py close_window.py "Calculator" # Close window
py get_active_window.py # Get title of active window
# Click by text (No coordinates needed!)
py click_text.py "Save" # Click "Save" button anywhere
py click_text.py "Submit" "Chrome" # Click "Submit" in Chrome only
# Drag and Drop
py drag.py 100 100 500 300 # Drag from (100,100) to (500,300)
# Robust Automation (Wait/Find)
py wait_for_text.py "Ready" "App" 30 # Wait up to 30s for text
py wait_for_window.py "Notepad" 10 # Wait for window to appear
py find_text.py "Login" "Chrome" # Get coordinates of text
py list_windows.py # List all open windows
py read_window.py "Notepad" # Read all text from Notepad
py read_window.py "Visual Studio" # Read text from VS Code
py read_window.py "Chrome" # Read text from browser
Uses Windows UI Automation to extract actual text (not OCR). Much faster and more accurate than screenshots!
py read_ui_elements.py "Chrome" # All interactive elements
py read_ui_elements.py "Chrome" --buttons-only # Just buttons
py read_ui_elements.py "Chrome" --links-only # Just links
py read_ui_elements.py "Chrome" --json # JSON output
Returns buttons, links, tabs, checkboxes, dropdowns with coordinates for clicking.
py read_webpage.py # Read active browser
py read_webpage.py "Chrome" # Target Chrome specifically
py read_webpage.py "Chrome" --buttons # Include buttons
py read_webpage.py "Chrome" --links # Include links with coords
py read_webpage.py "Chrome" --full # All elements (inputs, images)
py read_webpage.py "Chrome" --json # JSON output
Enhanced browser content extraction with headings, text, buttons, and links.
# List all open dialogs
py handle_dialog.py list
# Read current dialog content
py handle_dialog.py read
py handle_dialog.py read --json
# Click button in dialog
py handle_dialog.py click "OK"
py handle_dialog.py click "Save"
py handle_dialog.py click "Yes"
# Type into dialog text field
py handle_dialog.py type "myfile.txt"
py handle_dialog.py type "C:\path\to\file" --field 0
# Dismiss dialog (auto-finds OK/Close/Cancel)
py handle_dialog.py dismiss
# Wait for dialog to appear
py handle_dialog.py wait --timeout 10
py handle_dialog.py wait "Save As" --timeout 5
Handles Save/Open dialogs, message boxes, alerts, confirmations, etc.
py click_element.py "Save" # Click "Save" anywhere
py click_element.py "OK" --window "Notepad" # In specific window
py click_element.py "Submit" --type Button # Only buttons
py click_element.py "File" --type MenuItem # Menu items
py click_element.py --list # List clickable elements
py click_element.py --list --window "Chrome" # List in specific window
Click buttons, links, menu items by name without needing coordinates.
py read_region.py 100 100 500 300 # Read text from coordinates
Note: Requires Tesseract OCR installation. Use read_window.py instead for better results.
# Press Windows key
py key_press.py "win"
# Type "notepad"
py type_text.py "notepad"
# Press Enter
py key_press.py "enter"
# Wait a moment, then type
py type_text.py "Hello from AI!"
# Save
py key_press.py "ctrl+s"
# Read current VS Code content
py read_window.py "Visual Studio Code"
# Click at specific location (e.g., file explorer)
py click.py 50 100
# Type filename
py type_text.py "test.js"
# Press Enter
py key_press.py "enter"
# Verify new file opened
py read_window.py "Visual Studio Code"
# Read current content
py read_window.py "Notepad"
# User types something...
# Read updated content (no screenshot needed!)
py read_window.py "Notepad"
Method 1: Windows UI Automation (BEST)
read_window.py for any windowread_ui_elements.py for buttons/links with coordinatesread_webpage.py for browser content with structureMethod 2: Click by Name (NEW)
click_element.py to click buttons/links by nameMethod 3: Dialog Handling (NEW)
handle_dialog.py for popups, save dialogs, alertsMethod 4: Screenshot + Vision (Fallback)
Method 5: OCR (Optional)
read_region.py with Tesseractpyautogui.FAILSAFE = True (move mouse to top-left to abort)ctrl+z friendly actions when possibleStatus: ✅ READY FOR USE (v2.0 - Dialog & UI Elements)
Created: 2026-02-01
Updated: 2026-02-02
Generated Mar 1, 2026
QA teams can automate repetitive testing workflows across Windows applications without modifying source code. The skill enables clicking UI elements, typing test data, and validating screen content through text extraction and screenshot comparison.
Businesses can automate data entry, form filling, and report generation across legacy Windows applications like ERP systems. The skill handles dialog boxes, clicks buttons by name, and extracts text from windows for data validation.
Organizations can automate testing of Windows applications for accessibility compliance by programmatically interacting with UI elements and extracting text content. The skill helps verify screen reader compatibility and keyboard navigation workflows.
IT departments can create automated troubleshooting scripts that interact with Windows system tools, control panels, and user applications. The skill enables clicking through installation wizards, configuring settings, and gathering system information.
Companies migrating between Windows applications can automate data transfer by reading content from source applications and typing into destination systems. The skill handles window switching, text extraction, and form completion across multiple applications.
Offer automated workflow solutions to businesses using legacy Windows applications. Charge monthly subscription fees for maintaining and executing automation scripts that handle data entry, report generation, and system integration tasks.
Build a testing platform that uses this skill to automate QA processes for Windows applications. Sell licenses to software development companies who need to test their desktop applications across different scenarios without manual intervention.
Provide consulting services to implement desktop automation solutions for specific business processes. Charge project-based fees for designing, developing, and deploying automation scripts that leverage the skill's capabilities for client workflows.
💬 Integration Tip
Start with simple click and type operations, then gradually incorporate text reading and dialog handling for more robust automation workflows.
Control Android devices via ADB with support for UI layout analysis (uiautomator) and visual feedback (screencap). Use when you need to interact with Android apps, perform UI automation, take screenshots, or run complex ADB command sequences.
Build, test, and ship iOS apps with Swift, Xcode, and App Store best practices.
Control macOS GUI apps visually — take screenshots, click, scroll, type. Use when the user asks to interact with any Mac desktop application's graphical interface.
Best practices and example-driven guidance for building SwiftUI views and components. Use when creating or refactoring SwiftUI UI, designing tab architecture with TabView, composing screens, or needing component-specific patterns and examples.
Write safe Swift code avoiding memory leaks, optional traps, and concurrency bugs.
Swift Concurrency review and remediation for Swift 6.2+. Use when asked to review Swift Concurrency usage, improve concurrency compliance, or fix Swift concurrency compiler errors in a feature or file.