playwright-scraper-skill-1-2-0Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
Install via ClawdBot CLI:
clawdbot install itsjustFred/playwright-scraper-skill-1-2-0A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.
| Target Website | Anti-Bot Level | Recommended Method | Script |
|---------------|----------------|-------------------|--------|
| Regular Sites | Low | web_fetch tool | N/A (built-in) |
| Dynamic Sites | Medium | Playwright Simple | scripts/playwright-simple.js |
| Cloudflare Protected | High | Playwright Stealth โญ | scripts/playwright-stealth.js |
| YouTube | Special | deep-scraper | Install separately |
| Reddit | Special | reddit-scraper | Install separately |
cd playwright-scraper-skill
npm install
npx playwright install chromium
Use OpenClaw's built-in web_fetch tool:
# Invoke directly in OpenClaw
Hey, fetch me the content from https://example.com
Use Playwright Simple:
node scripts/playwright-simple.js "https://example.com"
Example output:
{
"url": "https://example.com",
"title": "Example Domain",
"content": "...",
"elapsedSeconds": "3.45"
}
Use Playwright Stealth:
node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"
Features:
navigator.webdriver = false)Use deep-scraper (install separately):
# Install deep-scraper skill
npx clawhub install deep-scraper
# Use it
cd skills/deep-scraper
node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID"
scripts/playwright-simple.jsscripts/playwright-stealth.js โญIf the site doesn't have dynamic loading, use OpenClaw's web_fetch toolโit's fastest.
If you need to wait for JavaScript rendering, use playwright-simple.js.
If you encounter 403 or Cloudflare challenges, use playwright-stealth.js.
All scripts support environment variables:
# Set screenshot path
SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL
# Set wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-simple.js URL
# Enable headful mode (show browser)
HEADLESS=false node scripts/playwright-stealth.js URL
# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js URL
# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL
| Method | Speed | Anti-Bot | Success Rate (Discuss.com.hk) |
|--------|-------|----------|-------------------------------|
| web_fetch | โก Fastest | โ None | 0% |
| Playwright Simple | ๐ Fast | โ ๏ธ Low | 20% |
| Playwright Stealth | โฑ๏ธ Medium | โ Medium | 100% โ |
| Puppeteer Stealth | โฑ๏ธ Medium | โ Medium-High | ~80% |
| Crawlee (deep-scraper) | ๐ข Slow | โ Detected | 0% |
| Chaser (Rust) | โฑ๏ธ Medium | โ Detected | 0% |
Lessons learned from our testing:
navigator.webdriver โ EssentialaddInitScript (Playwright) โ Inject before page loadSolution: Use playwright-stealth.js
Solution:
headless: false (headful mode sometimes has higher success rate)Solution:
waitForTimeoutwaitUntil: 'networkidle' or 'domcontentloaded'Best Solution: Pure Playwright + anti-bot techniques (framework-independent)
browser toolGenerated Mar 1, 2026
Scraping competitor product listings, prices, and reviews from dynamic e-commerce sites to analyze market trends and adjust pricing strategies. This is useful for businesses needing real-time data on competitor offerings and customer sentiment.
Collecting articles and updates from news websites with anti-bot protection to provide timely content for media analysis or alert services. This helps organizations track breaking news and industry developments without manual effort.
Extracting property listings, prices, and details from real estate portals that use JavaScript rendering to gather market insights for investors or agencies. This supports decision-making in property valuation and investment opportunities.
Scraping scholarly articles, forums, or discussion boards like Discuss.com.hk to collect data for social science or market studies, bypassing anti-bot measures to ensure comprehensive dataset acquisition.
Pulling stock prices, financial news, and economic indicators from protected financial websites to feed into analysis tools for traders and analysts, enabling automated updates without manual data entry.
Offering subscription-based access to scraped data from protected websites, providing clients with cleaned and structured datasets for analysis. Revenue is generated through monthly or annual fees based on data volume and frequency.
Developing and selling tailored scraping scripts or consulting services for businesses needing specific data extraction from complex sites. Revenue comes from one-time project fees or ongoing maintenance contracts.
Integrating the skill into a larger API platform that offers web scraping as a service, allowing developers to access data via API calls. Revenue is generated through pay-per-use pricing or API subscription plans.
๐ฌ Integration Tip
Start with web_fetch for simple sites and escalate to stealth scripts for anti-bot protection, using environment variables to customize wait times and outputs.
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with w...
Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
Browser automation and web scraping with Playwright. Forms, screenshots, data extraction. Works standalone or via MCP. Testing included.
Performs deep scraping of complex sites like YouTube using containerized Crawlee, extracting validated, ad-free transcripts and content as JSON output.
Automate web tasks like form filling, data scraping, testing, monitoring, and scheduled jobs with multi-browser support and retry mechanisms.
Web scraping and content comprehension agent โ multi-strategy extraction with cascade fallback, news detection, boilerplate removal, structured metadata, and...