anycrawlPerform high-performance web scraping, crawling, and Google search with multi-engine support and structured data extraction via AnyCrawl API.
Install via ClawdBot CLI:
clawdbot install techlaai/anycrawlAnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web content with high-performance multi-threaded crawling.
export ANYCRAWL_API_KEY="your-api-key"
Make it permanent by adding to ~/.bashrc or ~/.zshrc:
echo 'export ANYCRAWL_API_KEY="your-api-key"' >> ~/.bashrc
source ~/.bashrc
Get your API key at: https://anycrawl.dev
openclaw config.patch --set ANYCRAWL_API_KEY="your-api-key"
Scrape a single URL and convert to LLM-ready structured data.
Parameters:
url (string, required): URL to scrapeengine (string, optional): Scraping engine - "cheerio" (default), "playwright", "puppeteer"formats (array, optional): Output formats - ["markdown"], ["html"], ["text"], ["json"], ["screenshot"]timeout (number, optional): Timeout in milliseconds (default: 30000)wait_for (number, optional): Delay before extraction in ms (browser engines only)wait_for_selector (string/object/array, optional): Wait for CSS selectorsinclude_tags (array, optional): Include only these HTML tags (e.g., ["h1", "p", "article"])exclude_tags (array, optional): Exclude these HTML tagsproxy (string, optional): Proxy URL (e.g., "http://proxy:port")json_options (object, optional): JSON extraction with schema/promptextract_source (string, optional): "markdown" (default) or "html"Examples:
// Basic scrape with default cheerio
anycrawl_scrape({ url: "https://example.com" })
// Scrape SPA with Playwright
anycrawl_scrape({
url: "https://spa-example.com",
engine: "playwright",
formats: ["markdown", "screenshot"]
})
// Extract structured JSON
anycrawl_scrape({
url: "https://product-page.com",
engine: "cheerio",
json_options: {
schema: {
type: "object",
properties: {
product_name: { type: "string" },
price: { type: "number" },
description: { type: "string" }
},
required: ["product_name", "price"]
},
user_prompt: "Extract product details from this page"
}
})
Search Google and return structured results.
Parameters:
query (string, required): Search queryengine (string, optional): Search engine - "google" (default)limit (number, optional): Max results per page (default: 10)offset (number, optional): Number of results to skip (default: 0)pages (number, optional): Number of pages to retrieve (default: 1, max: 20)lang (string, optional): Language locale (e.g., "en", "zh", "vi")safe_search (number, optional): 0 (off), 1 (medium), 2 (high)scrape_options (object, optional): Scrape each result URL with these optionsExamples:
// Basic search
anycrawl_search({ query: "OpenAI ChatGPT" })
// Multi-page search in Vietnamese
anycrawl_search({
query: "hΖ°α»ng dαΊ«n Node.js",
pages: 3,
lang: "vi"
})
// Search and auto-scrape results
anycrawl_search({
query: "best AI tools 2026",
limit: 5,
scrape_options: {
engine: "cheerio",
formats: ["markdown"]
}
})
Start crawling an entire website (async job).
Parameters:
url (string, required): Seed URL to start crawlingengine (string, optional): "cheerio" (default), "playwright", "puppeteer"strategy (string, optional): "all", "same-domain" (default), "same-hostname", "same-origin"max_depth (number, optional): Max depth from seed URL (default: 10)limit (number, optional): Max pages to crawl (default: 100)include_paths (array, optional): Path patterns to include (e.g., ["/blog/*"])exclude_paths (array, optional): Path patterns to exclude (e.g., ["/admin/*"])scrape_paths (array, optional): Only scrape URLs matching these patternsscrape_options (object, optional): Per-page scrape optionsExamples:
// Crawl entire website
anycrawl_crawl_start({
url: "https://docs.example.com",
engine: "cheerio",
max_depth: 5,
limit: 50
})
// Crawl only blog posts
anycrawl_crawl_start({
url: "https://example.com",
strategy: "same-domain",
include_paths: ["/blog/*"],
exclude_paths: ["/blog/tags/*"],
scrape_options: {
formats: ["markdown"]
}
})
// Crawl product pages only
anycrawl_crawl_start({
url: "https://shop.example.com",
strategy: "same-domain",
scrape_paths: ["/products/*"],
limit: 200
})
Check crawl job status.
Parameters:
job_id (string, required): Crawl job IDExample:
anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })
Get crawl results (paginated).
Parameters:
job_id (string, required): Crawl job IDskip (number, optional): Number of results to skip (default: 0)Example:
// Get first 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 0 })
// Get next 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 100 })
Cancel a running crawl job.
Parameters:
job_id (string, required): Crawl job IDQuick helper: Search Google then scrape top results.
Parameters:
query (string, required): Search querymax_results (number, optional): Max results to scrape (default: 3)scrape_engine (string, optional): Engine for scraping (default: "cheerio")formats (array, optional): Output formats (default: ["markdown"])lang (string, optional): Search languageExample:
anycrawl_search_and_scrape({
query: "latest AI news",
max_results: 5,
formats: ["markdown"]
})
| Engine | Best For | Speed | JS Rendering |
|--------|----------|-------|--------------|
| cheerio | Static HTML, news, blogs | β‘ Fastest | β No |
| playwright | SPAs, complex web apps | π’ Slower | β
Yes |
| puppeteer | Chrome-specific, metrics | π’ Slower | β
Yes |
All responses follow this structure:
{
"success": true,
"data": { ... },
"message": "Optional message"
}
Error response:
{
"success": false,
"error": "Error type",
"message": "Human-readable message"
}
400 - Bad Request (validation errors)401 - Unauthorized (invalid API key)402 - Payment Required (insufficient credits)404 - Not Found429 - Rate limit exceeded500 - Internal server errorGenerated Mar 1, 2026
Scrape product details, pricing, and descriptions from competitor websites using the anycrawl_scrape function with JSON extraction. This enables businesses to monitor market trends, adjust pricing strategies, and identify gaps in their own offerings. It's ideal for dynamic pricing models and product catalog analysis.
Use anycrawl_search_and_scrape to gather the latest articles from multiple news sources based on specific queries. This helps in creating curated news feeds, summarizing trends, and providing up-to-date content for media outlets or research firms. It supports multi-language searches for global coverage.
Crawl an entire website with anycrawl_crawl_start to map all pages, extract content in markdown format, and analyze structure for SEO optimization. This assists in migrating sites to new platforms, identifying broken links, and ensuring content consistency. It's useful for web development agencies and digital marketers.
Scrape scholarly articles, datasets, or public reports from various sources using anycrawl_scrape with browser engines for JavaScript-heavy sites. Researchers can automate data gathering for literature reviews, trend analysis, or building datasets, saving time on manual collection. It supports structured output for easy analysis.
Search for companies or services using anycrawl_search with language and safe search filters, then scrape contact details or service pages. Sales teams can identify potential clients, gather insights on their offerings, and build targeted outreach lists. This streamlines prospecting efforts in competitive markets.
Offer a cloud-based service where users pay a monthly fee to access the AnyCrawl-API for automated data extraction. This model provides scalable usage tiers based on API calls or data volume, targeting businesses needing regular web data without infrastructure setup. Revenue comes from recurring subscriptions and premium support.
Provide bespoke data scraping solutions for clients in specific industries, such as real estate or finance, using the skill's functions to gather and structure data. Charge per project or on a retainer basis for delivering cleaned, analyzed datasets. This model leverages the API's flexibility for tailored client needs.
Use anycrawl_search_and_scrape to aggregate product reviews or deals from various sites, then monetize through affiliate links on a content platform. This model generates revenue from commissions on sales driven by the aggregated content, appealing to niche markets like tech gadgets or travel.
π¬ Integration Tip
Set the ANYCRAWL_API_KEY via environment variables for secure, persistent access across deployments, and use the cheerio engine for fast static scraping to optimize performance.
Summarize URLs or files with the summarize CLI (web, PDFs, images, audio, YouTube).
AI-optimized web search via Tavily API. Returns concise, relevant results for AI agents.
This skill should be used when users need to search the web for information, find current content, look up news articles, search for images, or find videos. It uses DuckDuckGo's search API to return results in clean, formatted output (text, markdown, or JSON). Use for research, fact-checking, finding recent information, or gathering web resources.
Web search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.
Search indexed Discord community discussions via Answer Overflow. Find solutions to coding problems, library issues, and community Q&A that only exist in Discord conversations.
Multi search engine integration with 17 engines (8 CN + 9 Global). Supports advanced search operators, time filters, site search, privacy engines, and WolframAlpha knowledge queries. No API keys required.