desearch-crawlCrawl/scrape and extract content from any webpage URL. Returns the page content as clean text or raw HTML. Use this when you need to read the full contents o...
Install via ClawdBot CLI:
clawdbot install okradze/desearch-crawlExtract content from any webpage URL. Returns clean text or raw HTML.
export DESEARCH_API_KEY='your-key-here'# Crawl a webpage (returns clean text by default)
scripts/desearch.py crawl "https://en.wikipedia.org/wiki/Artificial_intelligence"
# Get raw HTML
scripts/desearch.py crawl "https://example.com" --crawl-format html
| Option | Description |
|--------|-------------|
| --crawl-format | Output content format: text (default) or html |
scripts/desearch.py crawl "https://docs.python.org/3/tutorial/index.html"
scripts/desearch.py crawl "https://example.com/page" --crawl-format html
format=text, truncated, default)Artificial intelligence (AI) is the capability of computational systems to perform tasks that typically require human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making...
format=html, truncated)<!DOCTYPE html>
<html>
<head><title>Artificial intelligence - Wikipedia</title></head>
<body>
<p>Artificial intelligence (AI) is the capability of computational systems...</p>
</body>
</html>
text. Use --crawl-format html only when you need to inspect page structure.text format to avoid bloating the agent context with markup.Status 401, Unauthorized (e.g., missing/invalid API key)
{
"detail": "Invalid or missing API key"
}
Status 402, Payment Required (e.g., balance depleted)
{
"detail": "Insufficient balance, please add funds to your account to continue using the service."
}
Generated Mar 1, 2026
Businesses can use this skill to scrape competitor websites for product details, pricing, and content strategies. It helps in gathering data for benchmarking and identifying market trends efficiently.
Media companies can crawl news websites to aggregate articles, summaries, or raw HTML for content curation and analysis. This supports creating news feeds, monitoring coverage, and extracting insights from various sources.
Researchers and academics can extract text from online journals, databases, or educational websites to compile data for studies. It aids in literature reviews, data mining, and analyzing web-based information.
Digital marketing agencies can crawl client or competitor sites to analyze HTML structure, meta tags, and content for SEO optimization. This helps in identifying issues and improving search engine rankings.
Law firms or compliance teams can scrape regulatory websites or public records to track updates, extract legal texts, and ensure adherence to laws. It streamlines monitoring changes in policies or regulations.
Offer the crawling functionality as a paid API service, charging based on usage volume or subscription tiers. This model targets developers and businesses needing scalable web scraping without infrastructure management.
Collect and process web data using this skill, then sell aggregated datasets or analytical reports to clients. This adds value by transforming raw content into actionable business intelligence.
Incorporate the crawling skill into existing software platforms, such as CRM or marketing tools, as an add-on feature. This enhances product offerings and generates revenue through upselling or premium integrations.
💬 Integration Tip
Ensure the DESEARCH_API_KEY is securely stored as an environment variable to avoid exposure in code, and test with sample URLs to verify format outputs before full deployment.
Summarize URLs or files with the summarize CLI (web, PDFs, images, audio, YouTube).
AI-optimized web search via Tavily API. Returns concise, relevant results for AI agents.
This skill should be used when users need to search the web for information, find current content, look up news articles, search for images, or find videos. It uses DuckDuckGo's search API to return results in clean, formatted output (text, markdown, or JSON). Use for research, fact-checking, finding recent information, or gathering web resources.
Web search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.
Search indexed Discord community discussions via Answer Overflow. Find solutions to coding problems, library issues, and community Q&A that only exist in Discord conversations.
Multi search engine integration with 17 engines (8 CN + 9 Global). Supports advanced search operators, time filters, site search, privacy engines, and WolframAlpha knowledge queries. No API keys required.