skill-spotlightbrowser-automationclawhubopenclawagent-browser

agent-browser: The Rust-Powered Browser CLI That Saves 93% of Your AI Agent's Context

March 7, 2026·8 min read

When AI agents interact with the web, they have a token problem.

Tools like Playwright MCP dump raw HTML or verbose JSON into the context window. A typical page snapshot can consume 10,000–50,000 tokens before the agent even starts reasoning about what to do. On complex SPAs, that's your entire context window gone on one page load.

agent-browser by @TheSethRose takes a fundamentally different approach: instead of giving the agent a wall of HTML, it outputs a compact accessibility tree — a semantic representation of interactive elements, each identified by a stable short reference like @e1, @e2. The result, according to benchmarks from Vercel Labs (which maintains the underlying CLI), is a 93% reduction in context window usage compared to Playwright MCP.

With 74,000+ downloads on ClawHub and 14,000 GitHub stars, it's become the de facto browser automation choice for production AI agent workflows.

Architecture: Why Rust?

Most browser automation tools — Playwright, Puppeteer, Selenium — are built around a Node.js (or Python) runtime that communicates with a browser via the Chrome DevTools Protocol (CDP). This works, but it comes with overhead: startup time, memory footprint, and processing latency on every command.

agent-browser uses a two-tier architecture:

┌──────────────────────┐
│   Native Rust CLI    │  ← sub-millisecond command parsing
│   (agent-browser)    │
└──────────┬───────────┘
           │ CDP
┌──────────▼───────────┐
│   Node.js daemon     │  ← persistent browser instance (Playwright)
│   (manages Chrome)   │
└──────────────────────┘

The Rust CLI handles command parsing at near-instant speed and routes to a persistent Node.js daemon that maintains a running Playwright browser instance. The daemon stays alive between commands — no cold-start cost per action.

When the native binary isn't available (non-x86_64 Linux, some ARM environments), the system transparently falls back to a pure Node.js path. The API is identical; you just lose the sub-millisecond overhead.

The Core Innovation: Accessibility Tree + Stable Refs

The critical insight in agent-browser's design is this: AI agents don't need raw HTML. They need a structured description of what's interactive and what it means.

Running agent-browser snapshot doesn't return a DOM dump. It returns an accessibility tree — the same representation that screen readers use — annotated with short stable refs:

Document
  main
    heading "Sign in to GitHub" @e1
    group "Sign in with a passkey"
      button "Sign in with a passkey" @e2
    group
      labelText "Username or email address" @e3
        textbox "Username or email address" @e4
      labelText "Password" @e5
        textbox "Password" @e6
      link "Forgot password?" @e7
      button "Sign in" @e8
    group "New to GitHub?"
      link "Create an account" @e9

This is dramatically more compact than HTML. The agent can now reason about what's on screen — headings, buttons, inputs, links — and act on elements by ref:

agent-browser fill @e4 "myusername"
agent-browser fill @e6 "mypassword"
agent-browser click @e8

No fragile CSS selectors. No XPath. No "find the third div with class btn-primary." Refs are stable across re-renders because they're derived from the accessibility structure, not the DOM position.

Full Command Reference

Installation

# Global install (recommended for production)
npm install -g agent-browser
agent-browser install
 
# Linux: install system dependencies
agent-browser install --with-deps
 
# macOS via Homebrew
brew install agent-browser
 
# Quick test without installing
npx agent-browser open example.com

Note: npx routing is "noticeably slower" than a global install due to Node.js intermediary overhead. For agents running many commands, install globally.

agent-browser open example.com          # open in default browser
agent-browser goto https://github.com   # navigate current tab
agent-browser navigate back             # browser back
agent-browser navigate forward
agent-browser navigate reload

Snapshots & Screenshots

agent-browser snapshot                  # accessibility tree (AI-optimized, compact)
agent-browser snapshot --interactive    # show only interactive elements
agent-browser screenshot                # PNG screenshot
agent-browser screenshot --annotate     # screenshot with numbered element overlays
agent-browser screenshot --full         # full-page screenshot
agent-browser pdf output.pdf            # export page as PDF

Element Interaction (by ref)

agent-browser click @e2                 # click element from snapshot
agent-browser dblclick @e5             # double click
agent-browser fill @e4 "value"         # clear + type into input
agent-browser type @e4 "value"         # type without clearing
agent-browser press @e4 Enter          # key press on element
agent-browser hover @e3                # hover (for dropdowns, tooltips)
agent-browser select @e7 "option"      # select dropdown option
agent-browser check @e9                # check checkbox
agent-browser uncheck @e9              # uncheck checkbox

Semantic Finders (no ref needed)

# Find by ARIA role
agent-browser find role button --name "Submit" click
agent-browser find role textbox --name "Email" fill "[email protected]"
 
# Find by visible text
agent-browser find text "Sign in" click
 
# Find by label
agent-browser find label "Password" fill "secret"
 
# Find by placeholder
agent-browser find placeholder "Search..." type "query"

State & Auth Persistence

# Save browser auth state (cookies, localStorage)
agent-browser state save ./auth.json
 
# Load saved state (persist login across agent sessions)
agent-browser state load ./auth.json

JavaScript Execution

agent-browser eval "document.title"
agent-browser eval "window.scrollY"
 
# Base64 for complex scripts
agent-browser eval --base64 "cmV0dXJuIGRvY3VtZW50LnF1ZXJ5U2VsZWN0b3JBbGwoJ2EnKS5sZW5ndGg="

Network Interception

# Mock API responses for testing
agent-browser network route "*/api/user" --body '{"name":"test"}'
agent-browser network unroute "*/api/user"

Viewport & Device Emulation

agent-browser set viewport 375 812       # iPhone dimensions
agent-browser set device "iPhone 14 Pro"
agent-browser set geo 37.7749 -122.4194  # San Francisco

Visual Diffs & Debugging

agent-browser snapshot --diff ./prev.json   # compare against saved snapshot
agent-browser screenshot --diff ./prev.png   # pixel diff
agent-browser connect 9222                   # attach to existing Chrome via CDP port

Comparing agent-browser to Playwright MCP and Puppeteer

	agent-browser	Playwright MCP	Puppeteer
Primary language	Rust CLI + Node.js daemon	Node.js	Node.js
Context window usage	~93% less than Playwright MCP	High (HTML dumps)	High
Output format	Compact accessibility tree	Raw HTML / JSON	Raw HTML
Element selection	Stable refs (`@e1`) + semantic finders	CSS selectors / XPath	CSS selectors
Startup speed	Sub-millisecond (Rust)	~1-2s	~1-2s
Auth persistence	Built-in (`state save/load`)	Manual	Manual
AI agent design	Native — built for agents	Adaptation	Not designed for agents
Browser support	Chromium (via Playwright backend)	Chromium, Firefox, WebKit	Chromium, Firefox

The key trade-off: agent-browser wins decisively on context efficiency and agent ergonomics; Playwright wins if you need Firefox/WebKit or the full Playwright API surface.

Real Use Cases

1. Web Data Extraction Without Fragile Selectors

Before agent-browser, extracting structured data from a complex SPA meant either writing bespoke Playwright scripts (brittle, break on UI updates) or using a browser extension. With agent-browser:

agent-browser open https://app.example.com/dashboard
agent-browser state load ./auth.json   # already logged in
agent-browser snapshot                 # agent reads the accessible structure
# agent identifies @e12 as the revenue table
agent-browser get text @e12            # extract text content

No CSS selector archaeology. When the UI updates, the accessibility structure stays semantically consistent.

2. Form Automation Across Sessions

# Session 1: authenticate and save state
agent-browser open https://portal.example.com
agent-browser fill @e3 "[email protected]"
agent-browser fill @e5 "password"
agent-browser click @e7
agent-browser state save ./portal-auth.json
 
# All future sessions: skip login
agent-browser state load ./portal-auth.json
agent-browser goto https://portal.example.com/submit

3. Visual Regression Monitoring

# Baseline
agent-browser open https://myapp.com
agent-browser screenshot baseline.png
 
# After deployment
agent-browser open https://myapp.com
agent-browser screenshot current.png --diff baseline.png
# agent-browser reports pixel diff percentage

4. AI Agent Web Research Pipelines

The accessibility tree format is designed to be processed by LLMs directly. An agent can:

snapshot a page to understand its structure
Extract relevant @ref identifiers
Issue targeted click or get text commands
Follow links and repeat — all within a single context window, not multiple full HTML dumps

Installation via ClawHub

clawhub install agent-browser

The skill wraps the agent-browser CLI with OpenClaw-specific conventions: the allowed-tools config pre-authorizes Bash(agent-browser:*) so the agent can issue browser commands without per-command approval.

Verify the skill is active:

agent-browser --version
agent-browser install   # installs the browser binary if not present

Before installing, check the skill's current safety status at clawhub.ai/skills/agent-browser.

Frequently Asked Questions

Does it support Firefox or Safari?

Currently Chromium only (via the Playwright backend). Firefox support is on the roadmap. For cross-browser testing, Playwright MCP remains the better choice.

Does the npx path work for quick testing?

Yes, but it's "noticeably slower" per the official docs. For one-off experiments it's fine; for production agent workflows that issue many browser commands, install globally.

How does auth state persistence work across agent sessions?

agent-browser state save ./auth.json captures cookies, localStorage, and session storage. agent-browser state load restores it. This means an agent can log in once and reuse the session across many runs — useful for agents that need authenticated access to web apps.

Can it handle SPAs that load content asynchronously?

Yes. The underlying Playwright daemon waits for network idle and DOM stability before returning snapshot results. You can also use agent-browser wait commands for explicit waits.

What's the --annotate flag on screenshots?

It overlays numbered labels on every interactive element, matching the @e1, @e2 refs from the snapshot. Useful for debugging — you can visually verify which ref corresponds to which element.

← Back to Blog

skill-spotlightbrowser-automationclawhubopenclawagent-browser

agent-browser: The Rust-Powered Browser CLI That Saves 93% of Your AI Agent's Context

March 7, 2026·8 min read

When AI agents interact with the web, they have a token problem.

With 74,000+ downloads on ClawHub and 14,000 GitHub stars, it's become the de facto browser automation choice for production AI agent workflows.

Architecture: Why Rust?

agent-browser uses a two-tier architecture:

┌──────────────────────┐
│   Native Rust CLI    │  ← sub-millisecond command parsing
│   (agent-browser)    │
└──────────┬───────────┘
           │ CDP
┌──────────▼───────────┐
│   Node.js daemon     │  ← persistent browser instance (Playwright)
│   (manages Chrome)   │
└──────────────────────┘

The Core Innovation: Accessibility Tree + Stable Refs

The critical insight in agent-browser's design is this: AI agents don't need raw HTML. They need a structured description of what's interactive and what it means.

Running agent-browser snapshot doesn't return a DOM dump. It returns an accessibility tree — the same representation that screen readers use — annotated with short stable refs:

Document
  main
    heading "Sign in to GitHub" @e1
    group "Sign in with a passkey"
      button "Sign in with a passkey" @e2
    group
      labelText "Username or email address" @e3
        textbox "Username or email address" @e4
      labelText "Password" @e5
        textbox "Password" @e6
      link "Forgot password?" @e7
      button "Sign in" @e8
    group "New to GitHub?"
      link "Create an account" @e9

This is dramatically more compact than HTML. The agent can now reason about what's on screen — headings, buttons, inputs, links — and act on elements by ref:

agent-browser fill @e4 "myusername"
agent-browser fill @e6 "mypassword"
agent-browser click @e8

No fragile CSS selectors. No XPath. No "find the third div with class btn-primary." Refs are stable across re-renders because they're derived from the accessibility structure, not the DOM position.

Full Command Reference

Installation

# Global install (recommended for production)
npm install -g agent-browser
agent-browser install
 
# Linux: install system dependencies
agent-browser install --with-deps
 
# macOS via Homebrew
brew install agent-browser
 
# Quick test without installing
npx agent-browser open example.com

Note: npx routing is "noticeably slower" than a global install due to Node.js intermediary overhead. For agents running many commands, install globally.

agent-browser open example.com          # open in default browser
agent-browser goto https://github.com   # navigate current tab
agent-browser navigate back             # browser back
agent-browser navigate forward
agent-browser navigate reload

Snapshots & Screenshots

agent-browser snapshot                  # accessibility tree (AI-optimized, compact)
agent-browser snapshot --interactive    # show only interactive elements
agent-browser screenshot                # PNG screenshot
agent-browser screenshot --annotate     # screenshot with numbered element overlays
agent-browser screenshot --full         # full-page screenshot
agent-browser pdf output.pdf            # export page as PDF

Element Interaction (by ref)

agent-browser click @e2                 # click element from snapshot
agent-browser dblclick @e5             # double click
agent-browser fill @e4 "value"         # clear + type into input
agent-browser type @e4 "value"         # type without clearing
agent-browser press @e4 Enter          # key press on element
agent-browser hover @e3                # hover (for dropdowns, tooltips)
agent-browser select @e7 "option"      # select dropdown option
agent-browser check @e9                # check checkbox
agent-browser uncheck @e9              # uncheck checkbox

Semantic Finders (no ref needed)

# Find by ARIA role
agent-browser find role button --name "Submit" click
agent-browser find role textbox --name "Email" fill "[email protected]"
 
# Find by visible text
agent-browser find text "Sign in" click
 
# Find by label
agent-browser find label "Password" fill "secret"
 
# Find by placeholder
agent-browser find placeholder "Search..." type "query"

State & Auth Persistence

# Save browser auth state (cookies, localStorage)
agent-browser state save ./auth.json
 
# Load saved state (persist login across agent sessions)
agent-browser state load ./auth.json

JavaScript Execution

agent-browser eval "document.title"
agent-browser eval "window.scrollY"
 
# Base64 for complex scripts
agent-browser eval --base64 "cmV0dXJuIGRvY3VtZW50LnF1ZXJ5U2VsZWN0b3JBbGwoJ2EnKS5sZW5ndGg="

Network Interception

# Mock API responses for testing
agent-browser network route "*/api/user" --body '{"name":"test"}'
agent-browser network unroute "*/api/user"

Viewport & Device Emulation

agent-browser set viewport 375 812       # iPhone dimensions
agent-browser set device "iPhone 14 Pro"
agent-browser set geo 37.7749 -122.4194  # San Francisco

Visual Diffs & Debugging

agent-browser snapshot --diff ./prev.json   # compare against saved snapshot
agent-browser screenshot --diff ./prev.png   # pixel diff
agent-browser connect 9222                   # attach to existing Chrome via CDP port

Comparing agent-browser to Playwright MCP and Puppeteer

	agent-browser	Playwright MCP	Puppeteer
Primary language	Rust CLI + Node.js daemon	Node.js	Node.js
Context window usage	~93% less than Playwright MCP	High (HTML dumps)	High
Output format	Compact accessibility tree	Raw HTML / JSON	Raw HTML
Element selection	Stable refs (`@e1`) + semantic finders	CSS selectors / XPath	CSS selectors
Startup speed	Sub-millisecond (Rust)	~1-2s	~1-2s
Auth persistence	Built-in (`state save/load`)	Manual	Manual
AI agent design	Native — built for agents	Adaptation	Not designed for agents
Browser support	Chromium (via Playwright backend)	Chromium, Firefox, WebKit	Chromium, Firefox

The key trade-off: agent-browser wins decisively on context efficiency and agent ergonomics; Playwright wins if you need Firefox/WebKit or the full Playwright API surface.

Real Use Cases

1. Web Data Extraction Without Fragile Selectors

Before agent-browser, extracting structured data from a complex SPA meant either writing bespoke Playwright scripts (brittle, break on UI updates) or using a browser extension. With agent-browser:

agent-browser open https://app.example.com/dashboard
agent-browser state load ./auth.json   # already logged in
agent-browser snapshot                 # agent reads the accessible structure
# agent identifies @e12 as the revenue table
agent-browser get text @e12            # extract text content

No CSS selector archaeology. When the UI updates, the accessibility structure stays semantically consistent.

2. Form Automation Across Sessions

# Session 1: authenticate and save state
agent-browser open https://portal.example.com
agent-browser fill @e3 "[email protected]"
agent-browser fill @e5 "password"
agent-browser click @e7
agent-browser state save ./portal-auth.json
 
# All future sessions: skip login
agent-browser state load ./portal-auth.json
agent-browser goto https://portal.example.com/submit

3. Visual Regression Monitoring

# Baseline
agent-browser open https://myapp.com
agent-browser screenshot baseline.png
 
# After deployment
agent-browser open https://myapp.com
agent-browser screenshot current.png --diff baseline.png
# agent-browser reports pixel diff percentage

4. AI Agent Web Research Pipelines

The accessibility tree format is designed to be processed by LLMs directly. An agent can:

snapshot a page to understand its structure
Extract relevant @ref identifiers
Issue targeted click or get text commands
Follow links and repeat — all within a single context window, not multiple full HTML dumps

Installation via ClawHub

clawhub install agent-browser

Verify the skill is active:

agent-browser --version
agent-browser install   # installs the browser binary if not present

Before installing, check the skill's current safety status at clawhub.ai/skills/agent-browser.

Frequently Asked Questions

Does it support Firefox or Safari?

Currently Chromium only (via the Playwright backend). Firefox support is on the roadmap. For cross-browser testing, Playwright MCP remains the better choice.

Does the npx path work for quick testing?

Yes, but it's "noticeably slower" per the official docs. For one-off experiments it's fine; for production agent workflows that issue many browser commands, install globally.

How does auth state persistence work across agent sessions?

Can it handle SPAs that load content asynchronously?

Yes. The underlying Playwright daemon waits for network idle and DOM stability before returning snapshot results. You can also use agent-browser wait commands for explicit waits.

What's the --annotate flag on screenshots?

It overlays numbered labels on every interactive element, matching the @e1, @e2 refs from the snapshot. Useful for debugging — you can visually verify which ref corresponds to which element.

← Back to Blog

agent-browser: The Rust-Powered Browser CLI That Saves 93% of Your AI Agent's Context

Architecture: Why Rust?

The Core Innovation: Accessibility Tree + Stable Refs

Full Command Reference

Installation

Navigation

Snapshots & Screenshots

Element Interaction (by ref)

Semantic Finders (no ref needed)

State & Auth Persistence

JavaScript Execution

Network Interception

Viewport & Device Emulation

Visual Diffs & Debugging

Comparing agent-browser to Playwright MCP and Puppeteer

Real Use Cases

1. Web Data Extraction Without Fragile Selectors

2. Form Automation Across Sessions

3. Visual Regression Monitoring

4. AI Agent Web Research Pipelines

Installation via ClawHub

Frequently Asked Questions

agent-browser: The Rust-Powered Browser CLI That Saves 93% of Your AI Agent's Context

Architecture: Why Rust?

The Core Innovation: Accessibility Tree + Stable Refs

Full Command Reference

Installation

Navigation

Snapshots & Screenshots

Element Interaction (by ref)

Semantic Finders (no ref needed)

State & Auth Persistence

JavaScript Execution

Network Interception

Viewport & Device Emulation

Visual Diffs & Debugging

Comparing agent-browser to Playwright MCP and Puppeteer

Real Use Cases

1. Web Data Extraction Without Fragile Selectors

2. Form Automation Across Sessions

3. Visual Regression Monitoring

4. AI Agent Web Research Pipelines

Installation via ClawHub

Frequently Asked Questions