web-scraperWeb scraping and content comprehension agent — multi-strategy extraction with cascade fallback, news detection, boilerplate removal, structured metadata, and...
Install via ClawdBot CLI:
clawdbot install guifav/web-scraperGrade Good — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://openrouter.ai/api/v1/chat/completionsAudited Apr 16, 2026 · audit v1.0
Generated Mar 1, 2026
Marketing agencies can use this skill to scrape competitor websites, news articles, and blog posts to analyze content strategies, track industry trends, and identify keywords. The multi-strategy extraction ensures reliable data collection even from JavaScript-heavy sites, while LLM entity extraction helps identify key players and topics.
Researchers in social sciences or digital humanities can scrape news archives, academic blogs, and online publications to gather datasets for analysis. The pipeline's cleaning and metadata extraction stages produce structured JSON suitable for quantitative analysis, and the planning protocol helps manage ethical and technical risks like paywalls.
Investment firms can automate scraping of financial news sites, press releases, and regulatory updates to monitor market-moving events. The news detection and entity extraction stages filter relevant articles and extract companies, dates, and relationships, enabling real-time alerts and trend analysis.
E-commerce businesses can scrape product pages, customer reviews, and competitor pricing from various online retailers. The cascade extraction handles dynamic content, while structured metadata extraction captures prices, ratings, and categories for competitive analysis and inventory management.
Public relations agencies can track brand mentions, news coverage, and social media posts across the web. The skill's ability to detect articles and extract entities like organizations and locations helps in reputation management and reporting client impact metrics efficiently.
Offer a subscription-based platform where clients input URLs or domains to receive structured scraped data via API or dashboard. Revenue comes from tiered plans based on volume, with premium features like real-time monitoring and custom entity extraction using the LLM stage.
Provide bespoke web scraping solutions for enterprises, integrating this skill into their existing data pipelines or workflows. Revenue is generated through project-based fees, ongoing maintenance contracts, and training sessions on using the planning protocol and pipeline effectively.
Scrape public web data at scale, clean and enrich it using the pipeline's stages, and sell aggregated datasets or insights reports to businesses in specific industries. Revenue streams include one-time sales of datasets and subscription access to updated reports.
💬 Integration Tip
Ensure the OPENROUTER_API_KEY is set in the environment for LLM entity extraction, and install required Python packages like playwright and trafilatura before execution to avoid pipeline failures.
Scored Apr 18, 2026
A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.
Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications.
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with w...
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection
Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
Uses a headless browser to navigate web pages, interact with elements, and extract clean, readable text content from URLs.