url-fetcherFetch and save web content using only Python stdlib with URL and path validation, basic HTML-to-markdown conversion, and no API keys or external dependencies.
Install via ClawdBot CLI:
clawdbot install johstracke/url-fetcherFetch web content without API keys or external dependencies. Uses Python standard library only.
url_fetcher.py fetch <url>
url_fetcher.py fetch --markdown <url> [output_file]
Examples:
# Fetch and preview
url_fetcher.py fetch https://example.com
# Fetch and save HTML
url_fetcher.py fetch https://example.com ~/workspace/page.html
# Fetch and convert to basic markdown
url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md
# Fetch multiple articles
url_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md
url_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md
# Convert to markdown for reading
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md
# Fetch pages for processing
url_fetcher.py fetch https://news.example.com ~/workspace/content/latest.html
# Extract text
url_fetcher.py fetch --markdown https://blog.example.com ~/workspace/content/post.md
# Just preview content (no file save)
url_fetcher.py fetch https://example.com
#!/bin/bash
# batch_fetch.sh
URLS=(
"https://example.com/page1"
"https://example.com/page2"
"https://example.com/page3"
)
OUTPUT_DIR="$HOME/workspace/fetched"
mkdir -p "$OUTPUT_DIR"
for url in "${URLS[@]}"; do
filename=$(echo $url | sed 's|/||g')
url_fetcher.py fetch --markdown "$url" "$OUTPUT_DIR/$filename.md"
sleep 1 # Be nice to servers
done
Combine with research-assistant:
# Fetch article
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/article.md
# Extract key points
# Then use research-assistant to organize findings
Combine with task-runner:
# Add task to fetch content
task_runner.py add "Fetch article on topic X" "research"
# Fetch when ready
url_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md
Error: Request timeout after 10s
Solution: The server is slow or unreachable. Try again later or check the URL.
Error: HTTP 403: Forbidden
Solution: The site blocks automated requests. Try:
Error with special characters
Solution: The tool uses UTF-8 with error-ignore. Some characters may be lost.
Note: Basic markdown extraction
Solution: This tool uses simple regex for HTML→MD conversion. For better results:
from pathlib import Path
import subprocess
def fetch_and_process(url):
"""Fetch URL and process"""
output = Path.home() / "workspace" / "fetched" / "page.md"
output.parent.mkdir(parents=True, exist_ok=True)
# Fetch
subprocess.run([
"python3",
"/path/to/url_fetcher.py",
"fetch",
"--markdown",
url,
str(output)
])
# Process content
content = output.read_text()
return content
# Function for fetching
fetch_content() {
local url="$1"
local output="$2"
python3 ~/workspace/skills/url-fetcher/scripts/url_fetcher.py \
fetch --markdown "$url" "$output"
}
# Usage
fetch_content "https://example.com" ~/workspace/example.md
For full-featured scraping:
requests + beautifulsoup4 (requires pip install)scrapy framework (requires pip install)For better markdown:
markdownify library (requires pip install)For complex workflows:
This skill requires:
Perfect for autonomous agents with budget constraints.
If you improve this skill, please:
Use freely in your OpenClaw skills and workflows.
AI Usage Analysis
Analysis is being generated… refresh in a few seconds.
Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.
Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders.
Find, evaluate, and recommend ClawHub skills by need with quality filtering and preference learning.
Fetch full tweets, long tweets, quoted tweets, and X Articles from X/Twitter without login or API keys, using no dependencies and zero configuration.
Skill 查找器 | Skill Finder. 帮助发现和安装 ClawHub Skills | Discover and install ClawHub Skills. 回答'有什么技能可以X'、'找一个技能' | Answers 'what skill can X', 'find a skill'. 触发...
Generate QR codes from text or URL for mobile scanning.