scrapeLegal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling.
Install via ClawdBot CLI:
clawdbot install ivangdavila/scrapeBefore writing any scraping code:
{domain}/robots.txt, check if target path is disallowed. If yes, stop./terms, /tos, /legal. Explicit scraping prohibition = need permission.Mozilla/5.0 ... (contact: you@email.com)For code patterns and robots.txt parser, see code.md
Generated Mar 1, 2026
Scrape public pricing data from e-commerce sites like Amazon or Walmart to track competitor prices. Ensure compliance by checking robots.txt, using rate limiting, and avoiding personal data to support dynamic pricing strategies.
Collect property details such as prices, locations, and features from public real estate websites like Zillow. Follow legal guidelines by respecting robots.txt, minimizing data storage, and stripping any PII to create aggregated market reports.
Scrape job postings from career sites like LinkedIn or Indeed to analyze hiring trends and skill demands. Comply by using a proper User-Agent, avoiding login-protected data, and ensuring no personal information is collected for industry insights.
Gather public factual data from educational or government websites for research purposes, such as climate statistics or economic indicators. Adhere to legal boundaries by checking Terms of Service, implementing rate limits, and maintaining an audit trail for transparency.
Extract headlines and article summaries from news websites to track media trends or sentiment analysis. Ensure compliance by verifying robots.txt, using session reuse to reduce server load, and avoiding republishing copyrighted content.
Provide cleaned and structured scraped data to clients via subscription or one-time sales. Focus on legal compliance by using APIs when available, stripping PII, and maintaining audit trails to build trust and avoid violations.
Offer analytics dashboards based on scraped data, such as competitor insights or trend reports. Generate revenue through SaaS subscriptions by ensuring data is collected ethically with rate limiting and robots.txt compliance.
Develop and sell tailored scraping scripts or services for specific client needs, like real estate or retail monitoring. Monetize through project-based fees by emphasizing legal adherence, such as GDPR-aware data handling and ToS checks.
💬 Integration Tip
Integrate this skill by first checking robots.txt and Terms of Service programmatically, then using rate-limited requests with proper User-Agents to avoid legal issues and server strain.
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with w...
Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
Browser automation and web scraping with Playwright. Forms, screenshots, data extraction. Works standalone or via MCP. Testing included.
Performs deep scraping of complex sites like YouTube using containerized Crawlee, extracting validated, ad-free transcripts and content as JSON output.
Automate web tasks like form filling, data scraping, testing, monitoring, and scheduled jobs with multi-browser support and retry mechanisms.
Web scraping and content comprehension agent — multi-strategy extraction with cascade fallback, news detection, boilerplate removal, structured metadata, and...