mac-useControl macOS GUI apps visually โ take screenshots, click, scroll, type. Use when the user asks to interact with any Mac desktop application's graphical interface.
Install via ClawdBot CLI:
clawdbot install kekejun/mac-useControl any macOS GUI application through a screenshot โ pick element โ click โ verify loop.
Platform: macOS only (requires Apple Vision framework for OCR)
System binaries (pre-installed on macOS):
python3 โ via Homebrew (brew install python)screencapture โ built-in macOS utilityPython packages โ install from the skill directory:
pip3 install --break-system-packages -r {baseDir}/requirements.txt
The screenshot command captures a window, uses Apple Vision OCR to detect all text elements, draws numbered annotations on the image, and returns both:
/tmp/mac_use.png โ numbered green boxes around each detected text[{num: 1, text: "Submit", at: [500, 200]}, {num: 2, text: "Cancel", at: [600, 200]}, ...] where at is the center point [x, y] on the 1000x1000 canvas (origin at top-left)You receive both by calling Bash (gets JSON with element list) and then Read on /tmp/mac_use.png (gets the visual). Always do both so you can cross-reference the numbers with what you see.
# List all visible windows
python3 {baseDir}/scripts/mac_use.py list
# Screenshot + annotate (returns image + numbered element list)
python3 {baseDir}/scripts/mac_use.py screenshot <app> [--id N]
# Click element by number (primary click method)
python3 {baseDir}/scripts/mac_use.py clicknum <N>
# Click at canvas coordinates (fallback for unlabeled icons)
python3 {baseDir}/scripts/mac_use.py click --app <app> [--id N] <x> <y>
# Scroll inside a window
python3 {baseDir}/scripts/mac_use.py scroll --app <app> [--id N] <direction> <amount>
# Type text (uses clipboard paste โ supports all languages)
python3 {baseDir}/scripts/mac_use.py type [--app <app>] "text here"
# Press key or combo
python3 {baseDir}/scripts/mac_use.py key [--app <app>] <combo>
open -a "App Name" (optionally with a URL or file path)sleep 2
python3 {baseDir}/scripts/mac_use.py screenshot <app> [--id N]
This returns JSON with file (image path) and elements (numbered text list).
/tmp/mac_use.png to see the numbered elements visuallyclicknum N โ pick the number of a detected text elementclick --app x y โ only for unlabeled icons (arrows, close buttons, cart icons) that have no text and therefore no numberclicknum, type, key, or scrollShow all visible app windows.
python3 {baseDir}/scripts/mac_use.py list
Returns JSON array: [{"app":"Google Chrome","title":"Wikipedia","id":4527,"x":120,"y":80,"w":1200,"h":800}, ...]
Capture a window, detect text elements via OCR, annotate with numbered markers, and return the element list. The target window is automatically raised to the top before capture, so overlapping windows are handled.
python3 {baseDir}/scripts/mac_use.py screenshot chrome
python3 {baseDir}/scripts/mac_use.py screenshot chrome --id 4527
: fuzzy, case-insensitive match (e.g. "chrome" matches "Google Chrome")--id N: target a specific window ID (required when multiple windows of the same app exist)file: path to annotated screenshot (/tmp/mac_use.png)id, app, title, scale: window metadataelements: array of {num, text, at} โ the numbered clickable text elements, where at is [x, y] center coordinates on the 1000x1000 canvas (origin at top-left)--id/tmp/mac_use_elements.json for clicknumClick on a numbered element from the last screenshot. This is the primary click method.
python3 {baseDir}/scripts/mac_use.py clicknum 5
python3 {baseDir}/scripts/mac_use.py clicknum 12
N: the element number from the last screenshot outputclicked_num, text, canvas coords, and absolute screen coordsClick at a position using canvas coordinates. Fallback only โ use for unlabeled icons.
python3 {baseDir}/scripts/mac_use.py click --app chrome 500 300
python3 {baseDir}/scripts/mac_use.py click --app chrome --id 4527 500 300
Scroll inside an app window.
python3 {baseDir}/scripts/mac_use.py scroll --app chrome down 5
python3 {baseDir}/scripts/mac_use.py scroll --app notes up 10
up, down, left, rightType text into the currently focused input field.
python3 {baseDir}/scripts/mac_use.py type --app chrome "hello world"
python3 {baseDir}/scripts/mac_use.py type --app chrome "ไฝ ๅฅฝไธ็"
--app: activates the app first to ensure keystrokes go to the right windowPress a single key or key combination.
python3 {baseDir}/scripts/mac_use.py key --app chrome return
python3 {baseDir}/scripts/mac_use.py key --app chrome cmd+a
python3 {baseDir}/scripts/mac_use.py key --app chrome cmd+shift+s
--app: activates the app firstreturn, tab, escape, space, delete, backspace, up, down, left, rightcmd, ctrl, alt/opt, shiftclicknum over click โ only use direct coordinates for unlabeled iconsmultiple_windows error, use list to see all windows, then pass --idlist to find them and --id to target themsleep 2-3 after open -a before taking a screenshotosascript -e 'tell application "AppName" to activate' && sleep 1 when the target app may be behind other windowsclick only)Screenshots are rendered onto a 1000x1000 canvas:
# 1. Open WeChat
open -a "WeChat"
sleep 3
# 2. Screenshot WeChat โ find the mini program window
python3 {baseDir}/scripts/mac_use.py list
# โ find the mini program window ID
# 3. Screenshot the mini program (annotated + element list)
python3 {baseDir}/scripts/mac_use.py screenshot ๅพฎไฟก --id 41266
# โ returns: {"file": "/tmp/mac_use.png", "elements": [{num: 1, text: "ๆ็ดข", at: [500, 200]}, ...]}
# โ Read /tmp/mac_use.png to see annotated image
# 4. Click "ๆ็ดข" (element #1)
python3 {baseDir}/scripts/mac_use.py clicknum 1
# 5. Type search query
python3 {baseDir}/scripts/mac_use.py type --app ๅพฎไฟก "็ธ้ธก"
# 6. Press Enter
python3 {baseDir}/scripts/mac_use.py key --app ๅพฎไฟก return
sleep 2
# 7. Screenshot to see results
python3 {baseDir}/scripts/mac_use.py screenshot ๅพฎไฟก --id 41266
# โ Read /tmp/mac_use.png, pick a restaurant by number
# 8. Click on a restaurant (e.g. element #5)
python3 {baseDir}/scripts/mac_use.py clicknum 5
Generated Mar 1, 2026
This skill enables automated GUI testing for macOS applications by simulating user interactions like clicking buttons, typing text, and verifying screen states through OCR. It's ideal for QA teams to run regression tests on desktop apps without manual effort, ensuring UI elements function correctly across updates.
Use the skill to automate repetitive data entry tasks in macOS GUI applications, such as inputting customer information into CRM systems or updating spreadsheets. It reduces human error and saves time by visually identifying fields and entering data programmatically.
Assist users with disabilities by automating interactions with macOS apps through voice commands or scripts. The skill can click on-screen elements, type text, and navigate interfaces, making computers more accessible for individuals with motor or visual impairments.
Automate order fulfillment processes on macOS-based e-commerce platforms by clicking through order lists, confirming shipments, and updating statuses. This streamlines operations for small businesses using desktop apps without API access.
Create interactive tutorials for macOS educational apps by guiding users through steps with automated clicks and typing. It helps in training scenarios where learners need hands-on practice with software interfaces in a controlled environment.
Offer a cloud-based service that uses this skill to provide automated GUI testing and task automation for macOS users. Charge subscription fees based on usage tiers, targeting small businesses and developers who need reliable desktop automation without coding expertise.
Provide consulting services to integrate this skill into clients' existing workflows, such as setting up automated data pipelines or custom testing suites. Revenue comes from project-based fees and ongoing support contracts for tailored solutions.
Develop a user-friendly desktop app that leverages this skill for basic automation tasks, offering a free version with limited features and a paid premium version with advanced capabilities like scheduling and multi-app support. Monetize through one-time purchases or upgrades.
๐ฌ Integration Tip
Ensure Python 3 and required dependencies are installed via Homebrew, and always cross-reference the annotated screenshot with the JSON element list to avoid misclicks in automation workflows.
Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.
Control Android devices via ADB with support for UI layout analysis (uiautomator) and visual feedback (screencap). Use when you need to interact with Android apps, perform UI automation, take screenshots, or run complex ADB command sequences.
Build, test, and ship iOS apps with Swift, Xcode, and App Store best practices.
Swift Concurrency review and remediation for Swift 6.2+. Use when asked to review Swift Concurrency usage, improve concurrency compliance, or fix Swift concurrency compiler errors in a feature or file.
Write safe Swift code avoiding memory leaks, optional traps, and concurrency bugs.
Best practices and example-driven guidance for building SwiftUI views and components. Use when creating or refactoring SwiftUI UI, designing tab architecture with TabView, composing screens, or needing component-specific patterns and examples.