Agent Browser Skill: Ref-Based Headless Browser Control Built for AI Agents
14,934+ downloads and 19 stars on ClawHub. The agent-browser-clawdbot skill by @MaTriXy wraps Vercel Labs' agent-browser CLI — a headless browser built from the ground up for AI agents, not humans. The key insight: AI agents don't navigate browsers visually. They need structured, machine-readable snapshots and stable element references.
The Problem With Browser Tools for AI Agents
General-purpose browser tools give AI agents a screenshot and ask them to figure out what to click. That's fragile. Screenshots change when fonts load, layouts reflow, or ads appear. CSS selectors break when developers rename classes.
The agent-browser approach is different: it exposes the browser's accessibility tree — the same structured data screen readers use — as JSON snapshots with stable ref identifiers. The agent works with @e2, @e3, @e4 — refs that persist reliably across re-snapshots as long as the element exists.
Core Workflow Pattern
Every agent-browser interaction follows the same loop:
# 1. Navigate
agent-browser open https://example.com
# 2. Snapshot the page (accessibility tree as JSON)
agent-browser snapshot -i --json
# 3. Agent parses JSON, identifies element refs
# 4. Interact using refs
agent-browser click @e2
agent-browser fill @e3 "search query"
# 5. Re-snapshot after state changes
agent-browser snapshot -i --jsonThe -i flag filters to interactive elements only. The --json flag outputs machine-readable data. Both flags should always be used together in agent workflows.
Key Commands
Navigation
agent-browser open https://site.com
agent-browser back
agent-browser forward
agent-browser reload
agent-browser closeSnapshots
# Standard agent snapshot
agent-browser snapshot -i --json
# Compact format with depth limit
agent-browser snapshot -i -c -d 5 --json
# Scope to a specific section
agent-browser snapshot -s "#main" -i --jsonSnapshot output format:
{
"success": true,
"data": {
"refs": {
"e1": {"role": "heading", "name": "Example Domain"},
"e2": {"role": "button", "name": "Submit"},
"e3": {"role": "textbox", "name": "Email"}
}
}
}The agent reads refs, identifies the right element by role and name, then uses the ref ID for all subsequent interactions.
Interactions
agent-browser click @e2
agent-browser fill @e3 "text to type"
agent-browser hover @e4
agent-browser check @e5
agent-browser select @e6 "option-value"
agent-browser press "Enter"
agent-browser scroll down 500
agent-browser drag @e7 @e8Getting Information
agent-browser get text @e1 --json
agent-browser get html @e2 --json
agent-browser get attr @e4 "href" --json
agent-browser get url --json
agent-browser get count ".item" --jsonWaiting for State
Critical for SPAs where content loads asynchronously:
agent-browser wait @e2 # Wait for element to appear
agent-browser wait 1000 # Explicit millisecond delay
agent-browser wait --text "Welcome" # Wait for text content
agent-browser wait --url "**/dashboard" # Wait for URL pattern
agent-browser wait --load networkidle # Wait for all network requests
agent-browser wait --fn "window.ready === true"Session Management
One of agent-browser's most powerful features for AI agents: isolated named browser sessions.
# Two simultaneous sessions
agent-browser --session admin open app.com
agent-browser --session user open app.com
# List active sessions
agent-browser session listUse cases:
- Multi-account workflows — manage admin and user views simultaneously
- A/B testing — compare two versions of a page in separate sessions
- Parallel agents — multiple agents run isolated browsers without interference
State Persistence
Skip login flows by saving and loading browser state:
# After logging in manually or via script
agent-browser state save auth.json
# On subsequent runs — skip the login entirely
agent-browser state load auth.json
agent-browser open https://app.com/dashboardSaves cookies, localStorage, and sessionStorage. Significant time savings for workflows that require authentication.
Network Control
# Block ads and trackers
agent-browser network route "**/ads/*" --abort
# Mock an API response
agent-browser network route "**/api/products" --body '{"items": []}'
# Inspect what requests fired
agent-browser network requests --filter apiUseful for testing against controlled API responses, or cleaning up noisy pages before snapshotting.
Installation
npm install -g agent-browser
agent-browser install # Downloads Chromium
agent-browser install --with-deps # Linux: includes system dependenciesClawHub install adds the SKILL.md to your OpenClaw workspace:
clawhub install agent-browser-clawdbotWhen to Use This vs. Built-in Browser Tools
| Scenario | agent-browser | Built-in browser |
|---|---|---|
| Multi-step workflows | ✅ Optimal | ⚠️ Fragile |
| Deterministic element selection | ✅ Ref-based | ❌ Visual/CSS |
| Complex SPAs | ✅ Handles well | ⚠️ Unreliable |
| Session isolation | ✅ Named sessions | ❌ Single session |
| Screenshots/PDFs for analysis | ⚠️ Secondary | ✅ Purpose-built |
| Visual inspection | ❌ Not designed for | ✅ Optimal |
| Browser extension integration | ❌ Not supported | ✅ Supported |
Practical Tips
-
Always use
wait --load networkidleafter navigation on SPAs — don't snapshot until the page has settled. -
Scope snapshots with
-swhen you know which section of the page has the elements you need — reduces noise and speeds up parsing. -
Save auth state after setup — if an agent authenticates once, persist the state and skip auth on all subsequent runs.
-
Use
--headedduring development —AGENT_BROWSER_HEADED=true agent-browser ...shows the browser window so you can see exactly what the agent is doing. -
Re-snapshot after every interaction — DOM refs change after clicks and form submissions. The agent should re-snapshot before assuming refs are still valid.
Considerations
- Intermediate complexity — agents need to reliably parse JSON snapshots and map
role/namefields to the right refs. Simpler browser tools have lower cognitive overhead. - Chromium required —
agent-browser installdownloads Chromium (~200MB). Not suitable for environments where browser installation isn't allowed. - Not all web elements are accessible — sites with poor accessibility implementations may have elements that don't appear in the tree or lack useful
namefields. - Session limits — very long-running sessions accumulate memory. Periodic session restart is good practice for high-volume automation.
The Bigger Picture
The agent-browser-clawdbot skill reflects a broader shift: browser automation is being redesigned for machines, not adapted from tools built for humans. The accessibility tree is the right abstraction — it's stable, semantic, and already exists in every browser. Refs eliminate the brittleness of CSS selectors and the ambiguity of screenshots.
As AI agents take on more complex multi-step web tasks, this kind of structured, deterministic browser control becomes essential infrastructure.
View the skill on ClawHub: agent-browser-clawdbot