skill-spotlightai-agentsagent-browser-clawdbotclawhubopenclawbrowser-automationdeveloper-tools

Agent Browser Skill: Ref-Based Headless Browser Control Built for AI Agents

March 13, 2026·6 min read

14,934+ downloads and 19 stars on ClawHub. The agent-browser-clawdbot skill by @MaTriXy wraps Vercel Labs' agent-browser CLI — a headless browser built from the ground up for AI agents, not humans. The key insight: AI agents don't navigate browsers visually. They need structured, machine-readable snapshots and stable element references.

The Problem With Browser Tools for AI Agents

General-purpose browser tools give AI agents a screenshot and ask them to figure out what to click. That's fragile. Screenshots change when fonts load, layouts reflow, or ads appear. CSS selectors break when developers rename classes.

The agent-browser approach is different: it exposes the browser's accessibility tree — the same structured data screen readers use — as JSON snapshots with stable ref identifiers. The agent works with @e2, @e3, @e4 — refs that persist reliably across re-snapshots as long as the element exists.

Core Workflow Pattern

Every agent-browser interaction follows the same loop:

# 1. Navigate
agent-browser open https://example.com
 
# 2. Snapshot the page (accessibility tree as JSON)
agent-browser snapshot -i --json
 
# 3. Agent parses JSON, identifies element refs
# 4. Interact using refs
agent-browser click @e2
agent-browser fill @e3 "search query"
 
# 5. Re-snapshot after state changes
agent-browser snapshot -i --json

The -i flag filters to interactive elements only. The --json flag outputs machine-readable data. Both flags should always be used together in agent workflows.

Key Commands

agent-browser open https://site.com
agent-browser back
agent-browser forward
agent-browser reload
agent-browser close

Snapshots

# Standard agent snapshot
agent-browser snapshot -i --json
 
# Compact format with depth limit
agent-browser snapshot -i -c -d 5 --json
 
# Scope to a specific section
agent-browser snapshot -s "#main" -i --json

Snapshot output format:

{
  "success": true,
  "data": {
    "refs": {
      "e1": {"role": "heading", "name": "Example Domain"},
      "e2": {"role": "button", "name": "Submit"},
      "e3": {"role": "textbox", "name": "Email"}
    }
  }
}

The agent reads refs, identifies the right element by role and name, then uses the ref ID for all subsequent interactions.

Interactions

agent-browser click @e2
agent-browser fill @e3 "text to type"
agent-browser hover @e4
agent-browser check @e5
agent-browser select @e6 "option-value"
agent-browser press "Enter"
agent-browser scroll down 500
agent-browser drag @e7 @e8

Getting Information

agent-browser get text @e1 --json
agent-browser get html @e2 --json
agent-browser get attr @e4 "href" --json
agent-browser get url --json
agent-browser get count ".item" --json

Waiting for State

Critical for SPAs where content loads asynchronously:

agent-browser wait @e2                    # Wait for element to appear
agent-browser wait 1000                   # Explicit millisecond delay
agent-browser wait --text "Welcome"       # Wait for text content
agent-browser wait --url "**/dashboard"   # Wait for URL pattern
agent-browser wait --load networkidle     # Wait for all network requests
agent-browser wait --fn "window.ready === true"

Session Management

One of agent-browser's most powerful features for AI agents: isolated named browser sessions.

# Two simultaneous sessions
agent-browser --session admin open app.com
agent-browser --session user open app.com
 
# List active sessions
agent-browser session list

Use cases:

Multi-account workflows — manage admin and user views simultaneously
A/B testing — compare two versions of a page in separate sessions
Parallel agents — multiple agents run isolated browsers without interference

State Persistence

Skip login flows by saving and loading browser state:

# After logging in manually or via script
agent-browser state save auth.json
 
# On subsequent runs — skip the login entirely
agent-browser state load auth.json
agent-browser open https://app.com/dashboard

Saves cookies, localStorage, and sessionStorage. Significant time savings for workflows that require authentication.

Network Control

# Block ads and trackers
agent-browser network route "**/ads/*" --abort
 
# Mock an API response
agent-browser network route "**/api/products" --body '{"items": []}'
 
# Inspect what requests fired
agent-browser network requests --filter api

Useful for testing against controlled API responses, or cleaning up noisy pages before snapshotting.

Installation

npm install -g agent-browser
agent-browser install                    # Downloads Chromium
agent-browser install --with-deps        # Linux: includes system dependencies

ClawHub install adds the SKILL.md to your OpenClaw workspace:

clawhub install agent-browser-clawdbot

When to Use This vs. Built-in Browser Tools

Scenario	agent-browser	Built-in browser
Multi-step workflows	✅ Optimal	⚠️ Fragile
Deterministic element selection	✅ Ref-based	❌ Visual/CSS
Complex SPAs	✅ Handles well	⚠️ Unreliable
Session isolation	✅ Named sessions	❌ Single session
Screenshots/PDFs for analysis	⚠️ Secondary	✅ Purpose-built
Visual inspection	❌ Not designed for	✅ Optimal
Browser extension integration	❌ Not supported	✅ Supported

Practical Tips

Always use wait --load networkidle after navigation on SPAs — don't snapshot until the page has settled.
Scope snapshots with -s when you know which section of the page has the elements you need — reduces noise and speeds up parsing.
Save auth state after setup — if an agent authenticates once, persist the state and skip auth on all subsequent runs.
Use --headed during development — AGENT_BROWSER_HEADED=true agent-browser ... shows the browser window so you can see exactly what the agent is doing.
Re-snapshot after every interaction — DOM refs change after clicks and form submissions. The agent should re-snapshot before assuming refs are still valid.

Considerations

Intermediate complexity — agents need to reliably parse JSON snapshots and map role/name fields to the right refs. Simpler browser tools have lower cognitive overhead.
Chromium required — agent-browser install downloads Chromium (~200MB). Not suitable for environments where browser installation isn't allowed.
Not all web elements are accessible — sites with poor accessibility implementations may have elements that don't appear in the tree or lack useful name fields.
Session limits — very long-running sessions accumulate memory. Periodic session restart is good practice for high-volume automation.

The Bigger Picture

The agent-browser-clawdbot skill reflects a broader shift: browser automation is being redesigned for machines, not adapted from tools built for humans. The accessibility tree is the right abstraction — it's stable, semantic, and already exists in every browser. Refs eliminate the brittleness of CSS selectors and the ambiguity of screenshots.

As AI agents take on more complex multi-step web tasks, this kind of structured, deterministic browser control becomes essential infrastructure.

View the skill on ClawHub: agent-browser-clawdbot

← Back to Blog

skill-spotlightai-agentsagent-browser-clawdbotclawhubopenclawbrowser-automationdeveloper-tools

Agent Browser Skill: Ref-Based Headless Browser Control Built for AI Agents

March 13, 2026·6 min read

The Problem With Browser Tools for AI Agents

Core Workflow Pattern

Every agent-browser interaction follows the same loop:

# 1. Navigate
agent-browser open https://example.com
 
# 2. Snapshot the page (accessibility tree as JSON)
agent-browser snapshot -i --json
 
# 3. Agent parses JSON, identifies element refs
# 4. Interact using refs
agent-browser click @e2
agent-browser fill @e3 "search query"
 
# 5. Re-snapshot after state changes
agent-browser snapshot -i --json

The -i flag filters to interactive elements only. The --json flag outputs machine-readable data. Both flags should always be used together in agent workflows.

Key Commands

agent-browser open https://site.com
agent-browser back
agent-browser forward
agent-browser reload
agent-browser close

Snapshots

# Standard agent snapshot
agent-browser snapshot -i --json
 
# Compact format with depth limit
agent-browser snapshot -i -c -d 5 --json
 
# Scope to a specific section
agent-browser snapshot -s "#main" -i --json

Snapshot output format:

{
  "success": true,
  "data": {
    "refs": {
      "e1": {"role": "heading", "name": "Example Domain"},
      "e2": {"role": "button", "name": "Submit"},
      "e3": {"role": "textbox", "name": "Email"}
    }
  }
}

The agent reads refs, identifies the right element by role and name, then uses the ref ID for all subsequent interactions.

Interactions

agent-browser click @e2
agent-browser fill @e3 "text to type"
agent-browser hover @e4
agent-browser check @e5
agent-browser select @e6 "option-value"
agent-browser press "Enter"
agent-browser scroll down 500
agent-browser drag @e7 @e8

Getting Information

agent-browser get text @e1 --json
agent-browser get html @e2 --json
agent-browser get attr @e4 "href" --json
agent-browser get url --json
agent-browser get count ".item" --json

Waiting for State

Critical for SPAs where content loads asynchronously:

agent-browser wait @e2                    # Wait for element to appear
agent-browser wait 1000                   # Explicit millisecond delay
agent-browser wait --text "Welcome"       # Wait for text content
agent-browser wait --url "**/dashboard"   # Wait for URL pattern
agent-browser wait --load networkidle     # Wait for all network requests
agent-browser wait --fn "window.ready === true"

Session Management

One of agent-browser's most powerful features for AI agents: isolated named browser sessions.

# Two simultaneous sessions
agent-browser --session admin open app.com
agent-browser --session user open app.com
 
# List active sessions
agent-browser session list

Use cases:

Multi-account workflows — manage admin and user views simultaneously
A/B testing — compare two versions of a page in separate sessions
Parallel agents — multiple agents run isolated browsers without interference

State Persistence

Skip login flows by saving and loading browser state:

# After logging in manually or via script
agent-browser state save auth.json
 
# On subsequent runs — skip the login entirely
agent-browser state load auth.json
agent-browser open https://app.com/dashboard

Saves cookies, localStorage, and sessionStorage. Significant time savings for workflows that require authentication.

Network Control

# Block ads and trackers
agent-browser network route "**/ads/*" --abort
 
# Mock an API response
agent-browser network route "**/api/products" --body '{"items": []}'
 
# Inspect what requests fired
agent-browser network requests --filter api

Useful for testing against controlled API responses, or cleaning up noisy pages before snapshotting.

Installation

npm install -g agent-browser
agent-browser install                    # Downloads Chromium
agent-browser install --with-deps        # Linux: includes system dependencies

ClawHub install adds the SKILL.md to your OpenClaw workspace:

clawhub install agent-browser-clawdbot

When to Use This vs. Built-in Browser Tools

Scenario	agent-browser	Built-in browser
Multi-step workflows	✅ Optimal	⚠️ Fragile
Deterministic element selection	✅ Ref-based	❌ Visual/CSS
Complex SPAs	✅ Handles well	⚠️ Unreliable
Session isolation	✅ Named sessions	❌ Single session
Screenshots/PDFs for analysis	⚠️ Secondary	✅ Purpose-built
Visual inspection	❌ Not designed for	✅ Optimal
Browser extension integration	❌ Not supported	✅ Supported

Practical Tips

Always use wait --load networkidle after navigation on SPAs — don't snapshot until the page has settled.
Scope snapshots with -s when you know which section of the page has the elements you need — reduces noise and speeds up parsing.
Save auth state after setup — if an agent authenticates once, persist the state and skip auth on all subsequent runs.
Use --headed during development — AGENT_BROWSER_HEADED=true agent-browser ... shows the browser window so you can see exactly what the agent is doing.
Re-snapshot after every interaction — DOM refs change after clicks and form submissions. The agent should re-snapshot before assuming refs are still valid.

Considerations

Intermediate complexity — agents need to reliably parse JSON snapshots and map role/name fields to the right refs. Simpler browser tools have lower cognitive overhead.
Chromium required — agent-browser install downloads Chromium (~200MB). Not suitable for environments where browser installation isn't allowed.
Not all web elements are accessible — sites with poor accessibility implementations may have elements that don't appear in the tree or lack useful name fields.
Session limits — very long-running sessions accumulate memory. Periodic session restart is good practice for high-volume automation.

The Bigger Picture

As AI agents take on more complex multi-step web tasks, this kind of structured, deterministic browser control becomes essential infrastructure.

View the skill on ClawHub: agent-browser-clawdbot

← Back to Blog

Agent Browser Skill: Ref-Based Headless Browser Control Built for AI Agents

The Problem With Browser Tools for AI Agents

Core Workflow Pattern

Key Commands

Navigation

Snapshots

Interactions

Getting Information

Waiting for State

Session Management

State Persistence

Network Control

Installation

When to Use This vs. Built-in Browser Tools

Practical Tips

Considerations

The Bigger Picture

Agent Browser Skill: Ref-Based Headless Browser Control Built for AI Agents

The Problem With Browser Tools for AI Agents

Core Workflow Pattern

Key Commands

Navigation

Snapshots

Interactions

Getting Information

Waiting for State

Session Management

State Persistence

Network Control

Installation

When to Use This vs. Built-in Browser Tools

Practical Tips

Considerations

The Bigger Picture