playwright-scraper-skillPlaywright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
Install via ClawdBot CLI:
clawdbot install waisimon/playwright-scraper-skillA Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.
| Target Website | Anti-Bot Level | Recommended Method | Script |
|---------------|----------------|-------------------|--------|
| Regular Sites | Low | web_fetch tool | N/A (built-in) |
| Dynamic Sites | Medium | Playwright Simple | scripts/playwright-simple.js |
| Cloudflare Protected | High | Playwright Stealth โญ | scripts/playwright-stealth.js |
| YouTube | Special | deep-scraper | Install separately |
| Reddit | Special | reddit-scraper | Install separately |
cd playwright-scraper-skill
npm install
npx playwright install chromium
Use OpenClaw's built-in web_fetch tool:
# Invoke directly in OpenClaw
Hey, fetch me the content from https://example.com
Use Playwright Simple:
node scripts/playwright-simple.js "https://example.com"
Example output:
{
"url": "https://example.com",
"title": "Example Domain",
"content": "...",
"elapsedSeconds": "3.45"
}
Use Playwright Stealth:
node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"
Features:
navigator.webdriver = false)Use deep-scraper (install separately):
# Install deep-scraper skill
npx clawhub install deep-scraper
# Use it
cd skills/deep-scraper
node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID"
scripts/playwright-simple.jsscripts/playwright-stealth.js โญIf the site doesn't have dynamic loading, use OpenClaw's web_fetch toolโit's fastest.
If you need to wait for JavaScript rendering, use playwright-simple.js.
If you encounter 403 or Cloudflare challenges, use playwright-stealth.js.
All scripts support environment variables:
# Set screenshot path
SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL
# Set wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-simple.js URL
# Enable headful mode (show browser)
HEADLESS=false node scripts/playwright-stealth.js URL
# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js URL
# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL
| Method | Speed | Anti-Bot | Success Rate (Discuss.com.hk) |
|--------|-------|----------|-------------------------------|
| web_fetch | โก Fastest | โ None | 0% |
| Playwright Simple | ๐ Fast | โ ๏ธ Low | 20% |
| Playwright Stealth | โฑ๏ธ Medium | โ Medium | 100% โ |
| Puppeteer Stealth | โฑ๏ธ Medium | โ Medium-High | ~80% |
| Crawlee (deep-scraper) | ๐ข Slow | โ Detected | 0% |
| Chaser (Rust) | โฑ๏ธ Medium | โ Detected | 0% |
Lessons learned from our testing:
navigator.webdriver โ EssentialaddInitScript (Playwright) โ Inject before page loadSolution: Use playwright-stealth.js
Solution:
headless: false (headful mode sometimes has higher success rate)Solution:
waitForTimeoutwaitUntil: 'networkidle' or 'domcontentloaded'Best Solution: Pure Playwright + anti-bot techniques (framework-independent)
browser toolGenerated Mar 1, 2026
Scraping competitor product listings, prices, and reviews from dynamic e-commerce sites to analyze market trends and adjust pricing strategies. This is useful for businesses needing real-time data on product availability and customer sentiment.
Collecting articles and updates from news websites with anti-bot protection to provide timely content for media analysis or alert services. This helps organizations track breaking news and industry developments efficiently.
Extracting research papers, forum discussions, or educational content from protected academic sites for analysis in studies or literature reviews. This supports researchers in gathering large datasets without manual effort.
Gathering property details, prices, and images from real estate portals that use JavaScript rendering to compile market reports or feed data into listing platforms. This aids agents and investors in making informed decisions.
Scraping public posts or comments from social media-like forums with anti-bot measures to analyze user sentiment and trends for brand monitoring or marketing insights. This is valuable for companies tracking online reputation.
Offering subscription-based access to scraped data from protected websites, providing clients with regular updates and insights for decision-making. Revenue is generated through monthly or annual fees based on data volume and frequency.
Developing tailored scraping scripts for specific client needs, such as extracting data from niche sites with high anti-bot levels. Revenue comes from one-time development charges or ongoing maintenance contracts.
Integrating the scraping skill into existing business applications via APIs, enabling automated data feeds for internal tools or customer-facing products. Revenue is earned through licensing fees or per-API-call pricing.
๐ฌ Integration Tip
Start with the web_fetch tool for simple sites and escalate to stealth scripts only when needed to optimize performance and avoid unnecessary complexity.
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with w...
Browser automation and web scraping with Playwright. Forms, screenshots, data extraction. Works standalone or via MCP. Testing included.
Performs deep scraping of complex sites like YouTube using containerized Crawlee, extracting validated, ad-free transcripts and content as JSON output.
Automate web tasks like form filling, data scraping, testing, monitoring, and scheduled jobs with multi-browser support and retry mechanisms.
Web scraping and content comprehension agent โ multi-strategy extraction with cascade fallback, news detection, boilerplate removal, structured metadata, and...
Spin up unblocked browser sessions via Browser.cash for web automation. Sessions bypass anti-bot protections (Cloudflare, DataDome, etc.) making them ideal for scraping and automation.