skill-spotlightdocs-officemarkdown-convertclawhubopenclawrag

Markdown.new Skill: Convert Any Public URL to Clean Markdown for AI Workflows

March 11, 2026·6 min read

16,722 downloads, 38 stars. The Markdown.new Skill by @joelchance converts public web pages into clean, LLM-ready Markdown using the markdown.new service. No browser automation, no HTML parsing — just a Python script that returns structured text your agent can immediately reason over.

For workflows that need to pull content from the web — summarization, RAG ingestion, extraction, archiving — this is the lean path.

The Problem It Solves

Web pages are hostile to LLMs. Raw HTML is bloated with navigation, ads, scripts, and layout elements. Even after stripping tags, you're often left with broken paragraphs, repeated menus, and footer noise. And then there's the JavaScript-rendered content that doesn't exist until a browser executes it.

The Markdown.new skill handles this upstream. The markdown.new service converts pages using the best available method — fast pipeline, AI-assisted conversion, or full headless browser rendering — and returns clean Markdown. Your agent gets text, not HTML.

How It Works

The skill ships a single Python script: markdown_new_fetch.py. It wraps the markdown.new API with method selection, output control, and response metadata capture.

# From the skill directory
python3 scripts/markdown_new_fetch.py 'https://example.com' > page.md
 
# Or with absolute path from anywhere
python3 ~/.codex/skills/markdown-new/scripts/markdown_new_fetch.py 'https://example.com'

Path note: Always use the skill directory or an absolute path. Relative paths like scripts/... from an arbitrary workspace root will fail.

Core Options

Method Selection

# Auto (default) — fastest successful pipeline
python3 scripts/markdown_new_fetch.py 'https://example.com' --method auto
 
# AI conversion — Workers AI HTML-to-Markdown
python3 scripts/markdown_new_fetch.py 'https://example.com' --method ai
 
# Browser — headless rendering for JS-heavy pages
python3 scripts/markdown_new_fetch.py 'https://example.com' --method browser

The recommended workflow: try auto first. If the output misses JavaScript-rendered content, retry with --method browser. Only use --method browser when needed — it's slower.

Image Retention

By default, images are stripped. If your workflow needs image URLs:

python3 scripts/markdown_new_fetch.py 'https://example.com' --retain-images

Delivery Mode

For file-based workflows, --deliver-md forces Markdown file output with auto-generated filenames:

# Outputs to auto-named .md file
python3 scripts/markdown_new_fetch.py 'https://example.com' --deliver-md
 
# Explicit output path
python3 scripts/markdown_new_fetch.py 'https://example.com' --output page.md

In delivery mode, content is wrapped with <url> tags so your agent can track which content came from which source:

<url>
...markdown content...
</url>

Token Metadata

One underrated feature: the script captures response headers that tell you how many tokens the extracted content is:

x-markdown-tokens        # Token count of the converted content
x-rate-limit-remaining   # How many conversions you have left today

This is useful for RAG pipelines where you need to plan chunking before ingestion — you know the size of what's coming before you process it.

Rate Limits

The service allows 500 requests/day per IP on the free tier. For single-session agent workflows this is ample. For large-scale batch extraction, you'll need to pace requests or arrange higher limits directly with the service.

API Modes (Direct)

If you prefer calling the API directly:

# POST mode (recommended for automation)
POST https://markdown.new/
{"url": "https://example.com", "method": "auto", "retain_images": false}
 
# Prefix mode
GET https://markdown.new/https://example.com?method=browser&retain_images=true

Real-World Use Cases

RAG ingestion pipeline — Convert 50 documentation pages to Markdown overnight, ingest into a vector store. The token metadata helps you decide batch sizes.

Summarization — fetch URL → get Markdown → summarize in agent is cleaner than browser automation chains.

Archiving — Convert important pages to Markdown for version-controlled storage. Markdown is searchable, diffable, and human-readable.

Content monitoring — Fetch a page weekly, compare the Markdown output with last week's version to detect changes.

Research assistance — Agent fetches all sources in a bibliography and converts them to Markdown for analysis.

Comparison

Feature	Markdown.new Skill	Brave Search (--content)	Browser Automation
Clean Markdown output	✅	✅	Manual
JS-rendered content	✅ (--method browser)	❌	✅
Image retention	✅ opt-in	❌	✅
Token metadata	✅	❌	❌
No browser required	✅ (auto/ai methods)	✅	❌
Rate limit	500/day/IP	2k/mo (paid)	No limit
API key required	❌	✅	Varies

The key differentiator: no API key, token metadata, and the --method browser fallback for JS-heavy pages.

Practical Tips

Use auto first, browser only when needed. Browser rendering is slower and more expensive. Most static pages convert perfectly with auto.

Pipe to files for large jobs. Don't rely on stdout for large documents:

python3 scripts/markdown_new_fetch.py 'https://docs.example.com/api-reference' --output api-ref.md

Monitor x-rate-limit-remaining. Check the response metadata if you're running batch jobs so you don't run into the daily limit mid-pipeline.

Skip image retention unless needed. Image URLs add noise to Markdown that agents don't need for text-based analysis. Only enable --retain-images when the workflow actually uses the image URLs.

Don't use for private/authenticated pages. The service fetches publicly accessible URLs. Pages behind login or paywalls won't convert correctly.

Considerations

Public URLs only — No authentication support. Pages behind login, paywalls, or VPNs won't work.
Dynamic content — Even with --method browser, some pages that require user interaction to load content (infinite scroll, click-to-load) may be incomplete.
Rate limit — 500 requests/day/IP. Shared IP environments (corporate NAT) may exhaust this faster than expected.
Accuracy — The skill notes that output isn't "guaranteed complete for every page" — verify critical extractions, especially for tables and structured data.
Copyright and ToS — Convert only content you're permitted to process. The skill explicitly notes respecting robots.txt and terms of service.

The Bigger Picture

Web content ingestion is foundational to almost every agent workflow that goes beyond local files. The Markdown.new skill makes this as simple as it can be — one command, clean output, no browser overhead for the common case. The method fallback (auto → browser) handles the JS-rendered content edge case that breaks simpler approaches.

16,000+ downloads suggests it's become a standard utility layer in agent workflows that need to reason about web content.

View the skill on ClawHub: markdown-convert

← Back to Blog