midscene-computer-chrome-bridgeAI-powered browser automation using Midscene Bridge mode. Use this skill when the user wants to: - Browse, navigate, or open web pages in the user's own Chro...
Install via ClawdBot CLI:
clawdbot install quanru/midscene-computer-chrome-bridgeCRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:
>
1. Never run midscene commands in the background. Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop.
2. Run only one midscene command at a time. Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together.
3. Allow enough time for each command to complete. Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex act commands may need even longer.
Automate the user's real Chrome browser via the Midscene Chrome Extension (Bridge mode), preserving cookies, sessions, and login state. You (the AI agent) act as the brain, deciding which actions to take based on screenshots.
CRITICAL — Every command MUST follow this EXACT format. Do NOT modify the command prefix.
npx @midscene/web@1 --bridge <subcommand> [args]
--bridge flag is MANDATORY here — it activates Bridge mode to connect to the user's desktop Chrome browserThe user has already prepared Chrome and the Midscene Extension. Do NOT check browser or extension status before connecting — just connect directly.
Midscene requires models with strong visual grounding capabilities. The following environment variables must be configured — either as system environment variables or in a .env file in the current working directory (Midscene loads .env automatically):
MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"
Example: Gemini (Gemini-3-Flash)
MIDSCENE_MODEL_API_KEY="your-google-api-key"
MIDSCENE_MODEL_NAME="gemini-3-flash"
MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_FAMILY="gemini"
Example: Qwen3-VL
MIDSCENE_MODEL_API_KEY="your-openrouter-api-key"
MIDSCENE_MODEL_NAME="qwen/qwen3-vl-235b-a22b-instruct"
MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1"
MIDSCENE_MODEL_FAMILY="qwen3-vl"
Example: Doubao Seed 1.6
MIDSCENE_MODEL_API_KEY="your-doubao-api-key"
MIDSCENE_MODEL_NAME="doubao-seed-1-6-250615"
MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
MIDSCENE_MODEL_FAMILY="doubao-vision"
Commonly used models: Doubao Seed 1.6, Qwen3-VL, Zhipu GLM-4.6V, Gemini-3-Pro, Gemini-3-Flash.
If the model is not configured, ask the user to set it up. See Model Configuration for supported providers.
npx @midscene/web@1 --bridge connect --url https://example.com
npx @midscene/web@1 --bridge take_screenshot
After taking a screenshot, read the saved image file to understand the current page state before deciding the next action.
Use act to interact with the page and get the result. It autonomously handles all UI interactions internally — clicking, typing, scrolling, hovering, waiting, and navigating — so you should give it complex, high-level tasks as a whole rather than breaking them into small steps. Describe what you want to do and the desired effect in natural language:
# specific instructions
npx @midscene/web@1 --bridge act --prompt "click the Login button and fill in the email field with 'user@example.com'"
npx @midscene/web@1 --bridge act --prompt "scroll down and click the Submit button"
# or target-driven instructions
npx @midscene/web@1 --bridge act --prompt "click the country dropdown and select Japan"
npx @midscene/web@1 --bridge disconnect
Since CLI commands are stateless between invocations, follow this pattern:
act to perform the desired action or target-driven instructions.connect --url before any interaction."the button", say "the blue Submit button in the contact form"."the red Buy Now button" instead of "#buy-btn".act command: When performing consecutive operations within the same page, combine them into one act prompt instead of splitting them into separate commands. For example, "fill in the email and password fields, then click the Login button" should be a single act call, not three. This reduces round-trips, avoids unnecessary screenshot-analyze cycles, and is significantly faster.Example — Dropdown selection:
npx @midscene/web@1 --bridge act --prompt "click the country dropdown and select Japan"
npx @midscene/web@1 --bridge take_screenshot
Example — Form interaction:
npx @midscene/web@1 --bridge act --prompt "fill in the email field with 'user@example.com' and the password field with 'pass123', then click the Log In button"
npx @midscene/web@1 --bridge take_screenshot
Generated Mar 1, 2026
Automatically track competitor pricing across multiple e-commerce sites that require login sessions. The skill can navigate through authenticated portals, extract pricing data from product pages, and capture screenshots for visual verification of promotions or stock status.
Automate login and data extraction from multiple banking or investment portals that use complex authentication. The skill can navigate through secure login flows, extract account balances and transaction histories, and verify successful data capture through screenshots.
Monitor brand mentions and content across social media platforms that require user sessions. The skill can log into platforms, scroll through feeds, capture screenshots of specific posts, and verify UI elements like engagement metrics or ad placements.
Automate patient portal interactions for appointment scheduling and medical record access. The skill can handle secure login sessions, navigate through healthcare interfaces, fill out appointment forms, and capture confirmation screens for documentation.
Test multi-step booking flows on travel websites that maintain session states. The skill can simulate user journeys through flight searches, hotel selections, payment forms, and capture screenshots at each step to validate UI behavior and error handling.
Offer browser automation services to businesses needing regular data extraction or workflow testing. Charge monthly subscriptions based on automation complexity and frequency, with premium tiers for multi-step workflows requiring session persistence.
Provide automated UI testing services for web applications, particularly those with complex authentication flows. Charge per test suite or project, with additional fees for cross-browser testing and visual regression analysis using captured screenshots.
Build a platform that aggregates data from multiple authenticated sources for market research or competitive analysis. Monetize through API access fees or packaged data reports, leveraging the skill's ability to maintain persistent login sessions across sources.
💬 Integration Tip
Ensure environment variables for visual AI models are properly configured before execution, and always follow the synchronous command pattern to maintain the screenshot-analyze-act loop.
A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.
Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications.
Advanced desktop automation with mouse, keyboard, and screen control
Manage n8n workflows and automations via API. Use when working with n8n workflows, executions, or automation tasks - listing workflows, activating/deactivating, checking execution status, manually triggering workflows, or debugging automation issues.
Design and implement automation workflows to save time and scale operations as a solopreneur. Use when identifying repetitive tasks to automate, building workflows across tools, setting up triggers and actions, or optimizing existing automations. Covers automation opportunity identification, workflow design, tool selection (Zapier, Make, n8n), testing, and maintenance. Trigger on "automate", "automation", "workflow automation", "save time", "reduce manual work", "automate my business", "no-code automation".
Browser automation via Playwright MCP server. Navigate websites, click elements, fill forms, extract data, take screenshots, and perform full browser automation workflows.