skill-spotlightautomationbrowser-automationclawhubopenclawstagehand

Browser Automation: Natural Language Web Control with Local and Cloud Browser Support

March 11, 2026·6 min read

18,238 downloads, 217 installs, 23 stars. The Browser Automation skill by @peytoncasper turns natural language into browser actions using Stagehand CLI — and automatically switches between local Chrome and cloud-based Browserbase depending on what API keys you have available.

No Selenium. No page object models. No CSS selectors. Just commands like browser act "click the Sign In button".

The Problem It Solves

Browser automation frameworks have always required you to know the structure of the page you're automating — element IDs, CSS selectors, XPath expressions. When the page updates, your selectors break. When you encounter a login wall or CAPTCHA, you're blocked.

The Browser Automation skill flips this. Instead of you telling the browser where to click, you tell it what to do. Stagehand uses AI to interpret the current page state and translate your instruction into the right action. The underlying elements are its problem, not yours.

How It Works

The skill wraps Stagehand, an AI-native browser automation library. It exposes six commands:

browser navigate <url>                 # Go to a URL
browser act "<action>"                 # Natural language action
browser extract "<instruction>" ['{}'] # Extract data (optional JSON schema)
browser observe "<query>"              # Discover available elements
browser screenshot                     # Capture current state
browser close                          # Close the browser

Environment Auto-Detection

The skill reads your environment and picks the right browser:

Browserbase — if BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID are set
Local Chrome — if no Browserbase keys are present

No prompting, no configuration. It just works with what you have.

Setup

npm install    # Install dependencies
npm link       # Create global 'browser' command

Walkthrough

A typical agent interaction:

# Navigate to a site
browser navigate https://example.com
 
# Perform an action in natural language
browser act "click the Sign In button"
browser act "type '[email protected]' into the email field"
browser act "click Submit"
 
# Extract structured data
browser extract "get all product names and prices" '{"products": [{"name": "string", "price": "string"}]}'
 
# Verify the page state
browser screenshot
 
# Clean up
browser close

The observe command is useful for debugging — it discovers what interactive elements are available on the current page:

browser observe "what buttons or forms are visible?"

Local vs. Browserbase

Feature	Local Chrome	Browserbase
Speed	Faster	Slightly slower
Setup	Chrome required	API key required
Stealth mode	No	Yes
Proxy support	No	Yes
CAPTCHA handling	No	Yes
Best for	Development	Production / scraping
Cost	Free	Usage-based

For development and testing, local mode is faster and simpler. For production workflows where you need to scrape sites that actively block bots, Browserbase provides stealth and proxy support.

Real-World Use Cases

Form submission automation — Fill and submit forms on sites that don't have APIs. The natural language interface handles different field types without selector work.

Data extraction at scale — Use browser extract with a JSON schema to pull structured data from any page. Consistent output format regardless of the underlying HTML structure.

Login-protected content — Navigate authentication flows with browser act — the AI interprets login forms regardless of their specific implementation.

UI testing — Replace brittle Selenium scripts with natural language tests. browser observe helps you write assertions against what's actually on the page.

Screenshot pipelines — Capture visual state of web applications as part of monitoring or reporting workflows.

Competitive monitoring — Visit competitor pages, extract pricing or feature data, structure it for analysis.

Best Practices

Always navigate before interacting. The navigate command sets the page context that act and extract operate against.

Take screenshots after actions. browser screenshot lets your agent verify that the expected change happened before continuing. This is especially important for multi-step flows.

Be specific in action descriptions. "click the button" is ambiguous on a page with 10 buttons. "click the blue 'Add to Cart' button in the product section" is not.

Use observe when actions fail. If browser act doesn't work as expected, run browser observe to see what elements are available — the page might have a different structure than expected.

Close the browser when done. Stagehand keeps a browser process alive until you close it. Always call browser close at the end of a task.

Comparison

Feature	Browser Automation	Browser Use	Desktop Control	Playwright
Natural language actions	✅	✅	✅ (screen)	❌ selectors
Cloud browser (Browserbase)	✅	✅	❌	❌
Local fallback	✅ auto	Manual	✅	✅
Data extraction	✅ structured	✅	❌	Manual
CAPTCHA handling	✅ (Browserbase)	✅ (Cloud)	❌	❌
No browser required	❌	❌	❌	❌

Considerations

Dynamic content — Single-page apps that load content asynchronously may require a browser act "wait for the page to load" step before extraction.
CAPTCHA — Local mode can't solve CAPTCHAs. Switch to Browserbase for sites that actively block automation.
Rate limiting — Even with stealth mode, aggressive scraping will eventually trigger rate limits. Build in delays for large-scale extraction.
Chrome required locally — If you're running in a headless server environment without Chrome installed, you'll need Browserbase.
API cost — Browserbase usage is billed by the minute of browser time. Complex workflows add up.
Stagehand dependency — The AI interpretation adds latency compared to direct selector-based automation. For high-throughput extraction, traditional scrapers may be faster.

The Bigger Picture

Natural language browser control has been a research demo for years. The Browser Automation skill makes it a practical tool: six commands, auto-environment detection, and the ability to handle the messy reality of real websites (login walls, dynamic content, CAPTCHAs) by switching to Browserbase when needed.

18,000+ downloads reflects that browser automation is one of the most common things AI agents are asked to do — and this skill lowers the barrier significantly compared to writing Playwright scripts or configuring Selenium.

View the skill on ClawHub: browser-automation

← Back to Blog