afrexai-web-scraping-engineComplete web scraping methodology — legal compliance, architecture design, anti-detection, data pipelines, and production operations. Use when building scrap...
Install via ClawdBot CLI:
clawdbot install 1kalin/afrexai-web-scraping-engineGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Sends data to undocumented external endpoint (potential exfiltration)
post → https://example.com/loginCalls external URL not in known-safe list
https://example.com/loginAI Analysis
The skill is a comprehensive web scraping guide focused on methodology, compliance, and architecture. The flagged external URL 'https://example.com/login' appears to be a placeholder or example used in a compliance checklist template, not an instruction for the AI to call an actual endpoint. No active data exfiltration, credential harvesting, or malicious overrides are present in the provided definition.
Audited Apr 16, 2026 · audit v1.0
Generated Mar 20, 2026
Automatically track competitor pricing and product availability across major online retailers. This enables dynamic pricing strategies and inventory management by collecting data daily from target e-commerce sites, focusing on static HTML product pages.
Scrape property listings from real estate portals to analyze pricing trends, location data, and market demand. This supports investment decisions and market reports by extracting structured data from JavaScript-rendered pages with basic anti-bot measures.
Collect job postings from career sites to monitor hiring trends, skill demands, and salary ranges. This aids HR departments and job seekers by parsing data from sites with consistent structures, using static scrapers for efficiency.
Extract headlines, articles, and publication dates from news websites for trend analysis and content aggregation. This serves media companies and researchers by handling sites with varied anti-bot protections and ensuring compliance with copyright rules.
Gather scientific publications, citations, and metadata from academic databases for literature reviews and analysis. This supports researchers by scraping public data with respect to robots.txt and using managed services for complex sites.
Offer subscription-based access to curated datasets extracted from web sources, such as pricing or market trends. Revenue is generated through monthly or annual fees from businesses needing reliable, updated data without in-house scraping.
Provide tailored web scraping services for clients in specific industries, handling legal compliance and technical challenges. Revenue comes from project-based contracts or retainer fees for ongoing data extraction and pipeline maintenance.
Build and sell APIs that aggregate data from multiple web sources, offering cleaned and structured data via endpoints. Revenue is generated through pay-per-use pricing or tiered API access plans for developers and enterprises.
💬 Integration Tip
Start with a legal compliance check using the provided YAML template, then select tools based on site complexity and anti-bot measures to avoid common pitfalls.
Scored Apr 19, 2026
A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.
Browser automation via Playwright MCP server. Navigate websites, click elements, fill forms, extract data, take screenshots, and perform full browser automation workflows.
Browser automation via Playwright MCP. Navigate websites, click elements, fill forms, take screenshots, extract data, and debug real browser workflows. Use w...
Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications.
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with w...
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection