browser-jsLightweight CDP browser control for AI agents. Token-efficient alternative to the built-in browser tool — 3-10x fewer tokens per interaction. Use when browsi...
Install via ClawdBot CLI:
clawdbot install shaihazher/browser-jsLightweight CLI that talks to Chrome via CDP (Chrome DevTools Protocol). Returns minimal, indexed output that agents can act on immediately — no accessibility tree parsing, no ref hunting.
# Install dependency (one-time, in the skill scripts/ dir)
cd scripts && npm install
# Ensure browser is running with CDP enabled.
# With OpenClaw:
# browser start profile=openclaw
# Or manually:
# google-chrome --remote-debugging-port=18800 --user-data-dir=~/.browser-data
The tool connects to http://127.0.0.1:18800 by default. Override with CDP_URL env var.
mkdir -p ~/.local/bin
cat > ~/.local/bin/bjs << 'WRAPPER'
#!/bin/bash
exec node /path/to/scripts/browser.js "$@"
WRAPPER
chmod +x ~/.local/bin/bjs
bjs tabs List open tabs
bjs open <url> Navigate to URL
bjs tab <index> Switch to tab
bjs newtab [url] Open new tab
bjs close [index] Close tab
bjs elements [selector] List interactive elements (indexed)
bjs click <index> Click element by index
bjs type <index> <text> Type into element
bjs upload <path> [selector] Upload file to input (bypasses OS dialog)
bjs text [selector] Extract visible page text
bjs html <selector> Get element HTML
bjs eval <js> Run JavaScript in page
bjs screenshot [path] Save screenshot
bjs scroll <up|down|top|bottom> [px]
bjs url Current URL
bjs back / forward / refresh
bjs wait <ms>
Coordinate commands (cross-origin iframes, captchas, overlays):
bjs click-xy <x> <y> Click at page coordinates via CDP Input
bjs click-xy <x> <y> --double Double-click at coordinates
bjs click-xy <x> <y> --right Right-click at coordinates
bjs hover-xy <x> <y> Hover at page coordinates
bjs drag-xy <x1> <y1> <x2> <y2> Drag between coordinates
bjs iframe-rect <selector> Get iframe bounding box (for click-xy targeting)
elements scans the page for all interactive elements (links, buttons, inputs, selects, etc.) — including those inside shadow DOM (web components). This means sites like Reddit, GitHub, and other modern SPAs that use shadow DOM are fully supported. The scan recursively pierces all shadow roots.
Returns a compact numbered list:
[0] (link) Hacker News → https://news.ycombinator.com/news
[1] (link) new → https://news.ycombinator.com/newest
[2] (input:text) q
[3] (button) Submit
Then click 3 or type 2 search query — immediately actionable, no interpretation needed.
Auto-indexing: click and type auto-index elements if not already indexed. You can skip calling elements first and go straight to click/type after open. Call elements explicitly when you need to see what's on the page.
After navigation or AJAX changes: Elements get re-indexed automatically on next click/type if stamps are stale. For manual re-index, call elements again.
Real mouse events: click uses CDP Input.dispatchMouseEvent (mousePressed + mouseReleased) instead of JS .click(). This triggers React/Vue/Angular synthetic event handlers that ignore plain .click() calls. Works reliably on SPAs like Instagram, GitHub, LinkedIn.
upload uses CDP's DOM.setFileInputFiles to inject files directly into hidden elements — no OS file picker dialog. Works with Instagram, Twitter, any site with file uploads.
bjs upload ~/photos/image.jpg # auto-finds input[type=file]
bjs upload ~/docs/resume.pdf "input.file-drop" # specific selector
| Approach | Tokens per interaction | Notes |
|----------|----------------------|-------|
| bjs | ~50-200 | Indexed list, 1-line responses |
| browser tool (snapshot) | ~2,000-5,000 | Full accessibility tree |
| browser tool + thinking | ~3,000-8,000 | Plus reasoning to find refs |
Over a 10-step flow: ~1,500 tokens (bjs) vs ~30,000-80,000 (browser tool).
bjs open https://example.com # Navigate
bjs elements # See what's clickable
bjs click 5 # Click element [5]
bjs type 12 "hello world" # Type into element [12]
bjs text # Read page content
bjs screenshot /tmp/result.png # Verify visually
bjs automatically pierces shadow DOM boundaries. Sites built with web components (Reddit, GitHub, etc.) work out of the box — elements, click, type, and text all recurse into shadow roots. No special flags needed.
When you can't use click by index — e.g. the target is inside a cross-origin iframe (captcha checkbox, payment form, OAuth widget) — use coordinate-based commands that dispatch real CDP Input events at the OS level. These bypass all DOM boundaries.
Workflow for clicking inside an iframe:
bjs iframe-rect 'iframe[title*="hCaptcha"]' # Get bounding box
# Output: x=95 y=440 w=302 h=76 center=(246, 478)
bjs click-xy 125 458 # Click checkbox position
iframe-rect returns the iframe's position on the page. Add offsets to target specific elements inside it (e.g. a checkbox is typically near the left side).
Other uses:
hover-xy — trigger hover menus, tooltips that need mouse positiondrag-xy — slider controls, drag-and-drop, canvas interactionsclick-xy --double — double-click to select text, expand itemsclick-xy --right — context menusWhen to use coordinate commands vs click:
click — always preferred when the element shows up in elementsclick-xy — only when the target is inside a cross-origin iframe or otherwise unreachable by DOM indexingelements with a CSS selector narrows scope: bjs elements ".modal"eval runs arbitrary JS and returns the result — use for custom extractiontext caps at 8KB — enough for most pages, won't blow up contexthtml caps at 10KB — for inspecting specific elementsgrep to filter: bjs elements | grep -i "submit\|login"Generated Mar 1, 2026
AI agents can automate posting content, uploading images/videos, and engaging with posts on platforms like Instagram or Twitter. The tool's file upload capability bypasses OS dialogs, enabling seamless media sharing. Real mouse events ensure reliable interaction with modern SPAs that use React or Vue frameworks.
Agents can navigate e-commerce sites, extract product prices and details using text extraction, and track changes over time. The lightweight token usage allows for frequent page refreshes without high costs. Shadow DOM support ensures compatibility with modern web stores built on web components.
Automate filling support forms, uploading documentation, and navigating help portals on behalf of users. The indexed element interaction eliminates parsing overhead, allowing quick form completion. Session persistence maintains logged-in states across interactions.
Perform automated UI testing by clicking elements, typing text, and verifying page content. Coordinate commands enable testing of iframe-embedded components like payment forms or captchas. Screenshot functionality provides visual verification of test results.
Systematically browse academic journals, news sites, or government portals to extract structured information. The token-efficient design enables processing numerous pages while maintaining budget control. HTML extraction allows for precise data scraping from specific page elements.
Offer browser automation as a service where clients submit tasks through an API, and your system executes them using this tool. Charge per task or monthly subscription based on usage volume. The token efficiency reduces operational costs compared to traditional browser automation solutions.
Integrate this browser control into existing enterprise workflows for data entry, report generation, or system monitoring. Provide customization, support, and maintenance contracts. The ability to handle cross-origin iframes makes it suitable for complex enterprise web applications.
Sell pre-built automation scripts and templates that leverage this browser tool for common tasks like social media posting, data scraping, or form filling. Developers can purchase ready-to-use solutions rather than building from scratch. The CLI nature makes it easy to package and distribute.
💬 Integration Tip
Ensure Chrome runs with --remote-debugging-port enabled before integration. For production use, consider wrapping the CLI commands in error handling and retry logic to manage network fluctuations or page load delays.
A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.
Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications.
Advanced desktop automation with mouse, keyboard, and screen control
Manage n8n workflows and automations via API. Use when working with n8n workflows, executions, or automation tasks - listing workflows, activating/deactivating, checking execution status, manually triggering workflows, or debugging automation issues.
Design and implement automation workflows to save time and scale operations as a solopreneur. Use when identifying repetitive tasks to automate, building workflows across tools, setting up triggers and actions, or optimizing existing automations. Covers automation opportunity identification, workflow design, tool selection (Zapier, Make, n8n), testing, and maintenance. Trigger on "automate", "automation", "workflow automation", "save time", "reduce manual work", "automate my business", "no-code automation".
Browser automation via Playwright MCP server. Navigate websites, click elements, fill forms, extract data, take screenshots, and perform full browser automation workflows.