ClawHub Skills Lib

XPR Web Scraping — AI Agent Skill | Install, Stats & Docs | ClawHub Skills Lib

🌐 Browser Automation

XPR Web Scrapingv0.2.11

xpr-web-scraping

paulgnz

Tools for fetching and extracting cleaned text, metadata, and links from single or multiple web pages with format options and link filtering.

extractionlatestweb-scrapingxpr

Download Package View on ClawHub

Installs (all time)

1

Installs (current)

1

Downloads

1.0K

Stars

0

CreatedFeb 13, 2026

UpdatedMar 1, 2026

Install & Quick Start

Install via ClawdBot CLI:

clawdbot install paulgnz/xpr-web-scraping

Skill Package3 files

📋SKILL.mdmarkdown

Web Scraping

You have web scraping tools for fetching and extracting data from web pages:

Single page:

scrape_url — fetch a URL and get cleaned text content + metadata (title, description, link count)
Use format="text" (default) for most tasks — strips all HTML
Use format="markdown" to preserve headings, links, lists, bold/italic
Use format="html" only when you need raw HTML

Link discovery:

extract_links — fetch a page and extract all links with text and type (internal/external)
Use the pattern parameter to filter by regex (e.g. "\\.pdf$" for PDF links)
Links are deduplicated and resolved to absolute URLs

Multi-page research:

scrape_multiple — fetch up to 10 URLs in parallel for comparison/research
One failure doesn't block others (uses Promise.allSettled)

Best practices:

Prefer "text" format for content extraction, "markdown" for preserving structure
Don't scrape the same domain more than 5 times per minute
Combine with store_deliverable to save scraped content as job evidence
For very large pages, the content is limited to 5MB

🤖

AI Usage Analysis

Generated Mar 1, 2026

Data AnalystsMarket ResearchersContent Aggregatorsbeginner

💡 Application Scenarios

Competitive Price MonitoringRetail and E-commerce

Scrape e-commerce product pages daily to track competitor pricing and promotions. Use scrape_multiple for parallel monitoring of key competitors and store_deliverable to log changes over time for strategic adjustments.

News Aggregation for Market AnalysisFinance and Media

Extract latest articles from financial news websites using scrape_url with format='markdown' to preserve structure. Combine with extract_links to discover related reports, enabling real-time market trend analysis for investment decisions.

Academic Research Data CollectionEducation and Research

Scrape multiple scholarly articles or PDF links from university websites using extract_links with pattern='\.pdf$'. Use scrape_url to fetch text content for analysis, supporting literature reviews or data mining in academic projects.

Real Estate Listing CompilationReal Estate

Fetch property details from real estate portals using scrape_url with format='text' to extract cleaned data like prices and descriptions. Apply scrape_multiple to gather listings from various sources for market comparison and client reports.

Job Market InsightsHuman Resources and Recruitment

Scrape job postings from career sites to analyze hiring trends and skill demands. Use extract_links to filter internal job pages and scrape_url to extract key details, aiding recruitment agencies or HR departments in strategy planning.

💼 Business Models

Data-as-a-Service (DaaS)Subscription fees

Offer subscription-based access to scraped data feeds, such as daily price updates or news summaries. Revenue comes from monthly or annual fees charged to clients like retailers or analysts who rely on timely, structured data.

Custom Scraping SolutionsProject-based or hourly billing

Provide tailored web scraping services for specific client needs, such as one-time data extraction for market research or ongoing monitoring. Revenue is generated through project-based contracts or hourly consulting rates.

API Integration and ResellingLicensing and usage fees

Integrate the scraping tools into third-party platforms via APIs, enabling partners to offer data extraction features. Revenue streams include licensing fees, usage-based pricing, or revenue sharing from enhanced platform capabilities.

💬 Integration Tip

Combine scrape_url with store_deliverable to automatically save extracted content as evidence in job workflows, ensuring data traceability and compliance with best practices like rate limiting.