browserautomation-skillAdvanced headless browser automation skill for OpenClaw agents. Enables intelligent web navigation, form filling, data extraction, and UI testing with structured commands and semantic element targeting.
Install via ClawdBot CLI:
clawdbot install StveenLi/browserautomation-skillRequires:
READ THE PROJECT README.md FIRST!
Before executing any browser automation tasks:
npm install -g agent-browser
agent-browser install --with-deps
agent-browser --version
If installation fails, try:
npx agent-browser install --with-deps
Every browser automation task follows this pattern:
1. OPEN -> Navigate to target URL
2. SNAPSHOT -> Analyze page structure, get element refs
3. INTERACT -> Click, fill, select using refs
4. VERIFY -> Re-snapshot to confirm changes
5. REPEAT -> Continue until task complete
6. CLOSE -> Clean up browser session
| Command | Description |
|---------|-------------|
| agent-browser open | Navigate to URL |
| agent-browser back | Go back in history |
| agent-browser forward | Go forward |
| agent-browser reload | Reload current page |
| agent-browser close | Close browser session |
| Command | Description |
|---------|-------------|
| agent-browser snapshot -i | Most used: Interactive elements with refs |
| agent-browser snapshot -i -c | Compact interactive snapshot |
| agent-browser snapshot -s "#main" | Scope to specific container |
| agent-browser snapshot -d 3 | Limit tree depth |
| Command | Description |
|---------|-------------|
| agent-browser click @e1 | Click element |
| agent-browser fill @e1 "text" | Clear field and type (preferred for inputs) |
| agent-browser type @e1 "text" | Type without clearing |
| agent-browser press Enter | Press keyboard key |
| agent-browser press Control+a | Key combination |
| agent-browser select @e1 "value" | Select dropdown option |
| agent-browser check @e1 | Check checkbox |
| agent-browser uncheck @e1 | Uncheck checkbox |
| agent-browser hover @e1 | Hover over element |
| agent-browser upload @e1 file.pdf | Upload file |
| Command | Description |
|---------|-------------|
| agent-browser get text @e1 | Get element text content |
| agent-browser get html @e1 | Get inner HTML |
| agent-browser get value @e1 | Get input field value |
| agent-browser get attr @e1 href | Get specific attribute |
| agent-browser get title | Get page title |
| agent-browser get url | Get current URL |
| agent-browser get count ".selector" | Count matching elements |
| Command | Description |
|---------|-------------|
| agent-browser wait @e1 | Wait for element to appear |
| agent-browser wait 2000 | Wait milliseconds |
| agent-browser wait --text "Success" | Wait for text to appear |
| agent-browser wait --url "/dashboard" | Wait for URL change |
| agent-browser wait --load networkidle | Wait for network idle |
| Command | Description |
|---------|-------------|
| agent-browser screenshot out.png | Save screenshot |
| agent-browser screenshot --full out.png | Full page screenshot |
| agent-browser pdf output.pdf | Save page as PDF |
# 1. Open login page
agent-browser open https://example.com/login
# 2. Get interactive elements
agent-browser snapshot -i
# Output: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
# 3. Fill credentials
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "secure_password"
# 4. Submit
agent-browser click @e3
# 5. Wait for redirect
agent-browser wait --url "/dashboard"
agent-browser wait --load networkidle
# 6. Save session for reuse
agent-browser state save session.json
# 7. Verify success
agent-browser snapshot -i
# Navigate to listing page
agent-browser open https://example.com/products
# Get initial snapshot
agent-browser snapshot -i
# Extract data from each item
agent-browser get text @e1
agent-browser get attr @e2 href
# Check for pagination
agent-browser snapshot -s ".pagination"
# Click next if exists
agent-browser click @e5
agent-browser wait --load networkidle
# Re-snapshot for new content
agent-browser snapshot -i
# Open form
agent-browser open https://example.com/contact
# Analyze form structure
agent-browser snapshot -i
# Fill all required fields
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser fill @e3 "Hello, this is my message."
# Select dropdown if present
agent-browser select @e4 "support"
# Check required checkbox
agent-browser check @e5
# Submit form
agent-browser click @e6
# Wait and verify submission
agent-browser wait --text "Thank you"
agent-browser snapshot -i
# First time: Login and save state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "/home"
agent-browser state save auth-session.json
# Later: Restore session and continue
agent-browser state load auth-session.json
agent-browser open https://app.example.com/dashboard
# Already logged in!
# Open first site
agent-browser open https://site-a.com
# Open second tab
agent-browser tab new https://site-b.com
# List tabs
agent-browser tab
# Output: Tab 1: site-a.com, Tab 2: site-b.com (active)
# Switch between tabs
agent-browser tab 1
agent-browser snapshot -i
# Work on tab 1...
agent-browser tab 2
agent-browser snapshot -i
# Work on tab 2...
# Enable headed mode to see what's happening
agent-browser open https://example.com --headed
# Check for JavaScript errors
agent-browser errors
# View console output
agent-browser console
# Highlight element to verify selection
agent-browser highlight @e1
# Take screenshot for debugging
agent-browser screenshot debug.png
# Start trace for detailed analysis
agent-browser trace start
# ... perform actions ...
agent-browser trace stop trace.zip
When refs are unstable or you need more readable selectors:
# Find by role
agent-browser find role button click --name "Submit"
# Find by text content
agent-browser find text "Sign In" click
# Find by label
agent-browser find label "Email" fill "user@test.com"
# Find first matching selector
agent-browser find first ".item" click
# Find nth element
agent-browser find nth 2 "a" text
# Mock API response
agent-browser network route "*/api/user" --body '{"name":"Test User"}'
# Block analytics/ads
agent-browser network route "*google-analytics*" --abort
agent-browser network route "*facebook.com*" --abort
# View captured requests
agent-browser network requests --filter api
# Remove routes
agent-browser network unroute
# Set viewport for responsive testing
agent-browser set viewport 1920 1080
agent-browser set viewport 375 667 # Mobile
# Emulate device
agent-browser set device "iPhone 14"
agent-browser set device "Pixel 5"
# Set geolocation
agent-browser set geo 40.7128 -74.0060 # New York
# Dark mode testing
agent-browser set media dark
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser snapshot -i # ALWAYS do this after navigation
# GOOD: Clears existing text first
agent-browser fill @e1 "new text"
# BAD: Appends to existing text
agent-browser type @e1 "new text"
# After clicking that triggers navigation
agent-browser click @e1
agent-browser wait --load networkidle
# After AJAX updates
agent-browser click @e1
agent-browser wait --text "Updated"
# Switch to iframe before interacting
agent-browser frame "#iframe-id"
agent-browser snapshot -i
agent-browser click @e1
# Return to main frame
agent-browser frame main
# Save immediately after successful login
agent-browser state save session.json
# Can reload if something breaks later
# Re-snapshot to get updated refs
agent-browser snapshot -i
# Try semantic locator
agent-browser find text "Button Text" click
# Check if element is in iframe
agent-browser frame "#iframe"
agent-browser snapshot -i
# Increase timeout
agent-browser open https://slow-site.com --timeout 60000
# Wait explicitly
agent-browser wait --load networkidle
agent-browser wait 5000
# Reload saved state
agent-browser state load session.json
agent-browser reload
# Visual debugging
agent-browser open https://example.com --headed
agent-browser screenshot debug.png
agent-browser errors
agent-browser console
For working with multiple isolated browsers:
# Session 1
agent-browser --session user1 open https://app.com
agent-browser --session user1 snapshot -i
# Session 2 (completely isolated)
agent-browser --session user2 open https://app.com
agent-browser --session user2 snapshot -i
# List all sessions
agent-browser session list
# Each session has separate cookies, storage, and state
Add --json flag to get machine-readable output:
agent-browser snapshot -i --json | jq '.elements[]'
agent-browser get text @e1 --json
agent-browser get url --json
# Start recording from current page
agent-browser record start ./demo.webm
# Perform your automation
agent-browser click @e1
agent-browser fill @e2 "text"
# Stop and save
agent-browser record stop
agent-browser install --with-depsagent-browser snapshot -i to refresh refs--timeout 60000 for slow pages--headed flagagent-browser state save/loadagent-browser frame to switch contextGenerated Feb 28, 2026
Automate daily price checks across competitor websites by navigating to product pages, extracting current prices, and logging them into a spreadsheet. This helps businesses adjust pricing strategies dynamically based on market trends.
Log into banking or investment portals, navigate to statement sections, and extract transaction data into structured formats like CSV for analysis. This automates manual data entry for accounting and auditing processes.
Automate booking appointments on clinic websites by filling patient forms, selecting time slots, and confirming bookings. This reduces administrative workload and minimizes scheduling errors in medical facilities.
Scrape property listings from multiple real estate websites, extracting details like price, location, and amenities into a centralized database. This aids agents in comparing market offerings quickly.
Automate enrollment processes for online courses by logging into educational platforms, filling registration forms, and submitting payments. This streamlines onboarding for institutions with high student volumes.
Offer browser automation as a cloud-based service with tiered pricing based on usage limits and features. This model provides recurring revenue through monthly or annual subscriptions for businesses needing scalable automation.
Provide custom automation solutions for clients, including workflow design, implementation, and maintenance. This model generates project-based fees and ongoing support contracts tailored to specific industry needs.
Use automation to collect and sell structured data from websites, such as market trends or competitor insights, to businesses. This model creates revenue streams from data licensing and API access for analytics.
💬 Integration Tip
Integrate with existing CI/CD pipelines for automated UI testing, ensuring browser sessions are managed efficiently to avoid resource leaks during continuous deployment.
A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.
Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications.
Advanced desktop automation with mouse, keyboard, and screen control
Manage n8n workflows and automations via API. Use when working with n8n workflows, executions, or automation tasks - listing workflows, activating/deactivating, checking execution status, manually triggering workflows, or debugging automation issues.
Design and implement automation workflows to save time and scale operations as a solopreneur. Use when identifying repetitive tasks to automate, building workflows across tools, setting up triggers and actions, or optimizing existing automations. Covers automation opportunity identification, workflow design, tool selection (Zapier, Make, n8n), testing, and maintenance. Trigger on "automate", "automation", "workflow automation", "save time", "reduce manual work", "automate my business", "no-code automation".
Browser automation via Playwright MCP server. Navigate websites, click elements, fill forms, extract data, take screenshots, and perform full browser automation workflows.