mineru用 MinerU API 解析 PDF/Word/PPT/图片为 Markdown,支持公式、表格、OCR。适用于论文解析、文档提取。
Install via ClawdBot CLI:
clawdbot install EasonAI-5589/mineruOpenDataLab 出品
PDF/Word/PPT/图片 → 结构化 Markdown,公式表格全保留!
| 资源 | 链接 |
|------|------|
| 官网 | https://mineru.net/ |
| API 文档 | https://mineru.net/apiManage/docs |
| GitHub | https://github.com/opendatalab/MinerU |
| 类型 | 格式 |
|------|------|
| 📕 PDF | 论文、书籍、扫描件 |
| 📝 Word | .docx |
| 📊 PPT | .pptx |
| 🖼️ 图片 | .jpg, .png (OCR) |
# Header 认证
Authorization: Bearer {YOUR_API_KEY}
# 1. 提交任务
curl -X POST "https://mineru.net/api/v4/extract/task" \
-H "Authorization: Bearer $MINERU_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://arxiv.org/pdf/2410.17247",
"enable_formula": true,
"enable_table": true,
"layout_model": "doclayout_yolo",
"language": "en"
}'
# 返回: {"task_id": "xxx", "status": "pending"}
# 2. 轮询结果
curl "https://mineru.net/api/v4/extract/task/{task_id}" \
-H "Authorization: Bearer $MINERU_TOKEN"
# 返回: {"status": "done", "result": {...}}
# 1. 获取上传 URL
curl -X POST "https://mineru.net/api/v4/file-urls/batch" \
-H "Authorization: Bearer $MINERU_TOKEN" \
-d '{"file_names": ["paper1.pdf", "paper2.pdf"]}'
# 2. 上传文件到返回的 presigned URLs
# 3. 批量提交任务
curl -X POST "https://mineru.net/api/v4/extract/task/batch" \
-H "Authorization: Bearer $MINERU_TOKEN" \
-d '{"files": [{"url": "...", "name": "paper1.pdf"}, ...]}'
| 参数 | 类型 | 说明 |
|------|------|------|
| url | string | 文件 URL (支持 http/https) |
| enable_formula | bool | 启用公式识别 (默认 true) |
| enable_table | bool | 启用表格识别 (默认 true) |
| layout_model | string | doclayout_yolo (快) / layoutlmv3 (准) |
| language | string | en / ch / auto |
| model_version | string | pipeline / vlm / MinerU-HTML |
| 版本 | 速度 | 准确度 | 适用场景 |
|------|------|--------|----------|
| pipeline | ⚡ 快 | 高 | 常规文档 |
| vlm | 🐢 慢 | 最高 | 复杂版面 |
| MinerU-HTML | ⚡ 快 | 高 | 网页样式输出 |
解析完成后下载的 ZIP 包含:
output/
├── full.md # 完整 Markdown
├── content_list.json # 结构化内容
├── images/ # 提取的图片
└── layout.json # 版面分析结果
# 1. 创建论文目录
mkdir -p "./paper-reading/[CVPR 2025] NewPaper"
cd "./paper-reading/[CVPR 2025] NewPaper"
# 2. 提交解析任务
TASK_ID=$(curl -s -X POST "https://mineru.net/api/v4/extract/task" \
-H "Authorization: Bearer $MINERU_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://arxiv.org/pdf/XXXX.XXXXX"}' | jq -r '.task_id')
# 3. 等待完成 & 下载
# (轮询 status 直到 done,然后下载 result.zip)
# 4. 解压
unzip result.zip -d .
在 ~/.bashrc 或 OpenClaw config 中设置:
export MINERU_TOKEN="your_api_key_here"
| 限制 | 数值 |
|------|------|
| 单文件大小 | 200 MB |
| 单文件页数 | 600 页 |
| 并发任务数 | 根据套餐 |
https://arxiv.org/pdf/2410.17247
language: chvlm 模型论文解析不再手动复制粘贴!📖
Generated Feb 23, 2026
Researchers can automatically parse arXiv PDFs into structured Markdown with LaTeX formulas and tables preserved, enabling quick literature reviews and data extraction without manual copying. This is ideal for summarizing papers, building knowledge bases, or preparing annotated bibliographies.
Law firms can convert scanned contracts, Word documents, or PDFs into searchable Markdown text while retaining complex layouts and tables, streamlining document review and analysis for cases or compliance checks. OCR support handles mixed-language content in legal materials.
Companies can extract data from financial reports, PowerPoint presentations, and Word documents to create structured summaries or integrate content into databases, improving efficiency in reporting and decision-making processes. Batch processing allows handling multiple quarterly reports at once.
Libraries or museums can digitize historical documents, books, and images by converting them into Markdown with OCR, preserving formulas and tables for digital archives or online publications. This supports heritage preservation and accessibility initiatives.
Engineering teams can parse technical manuals, diagrams in PDFs, or PPT slides into Markdown to update documentation, extract specifications, or feed into knowledge management systems, ensuring accurate retention of complex tables and formulas.
Offer tiered subscription plans based on usage quotas, such as number of pages or files processed per month, with premium tiers for higher concurrency or advanced features like VLM model access. Revenue comes from recurring payments from businesses and researchers.
Provide custom licenses to large organizations for on-premise deployment or dedicated API instances, including support, training, and integration services. This targets industries like legal or finance with high-volume, secure document processing needs.
Implement a usage-based pricing model where users pay per document or page processed, appealing to occasional users or small projects. Integrate with platforms like OpenClaw for seamless billing and low-barrier access to document parsing capabilities.
💬 Integration Tip
Set the MINERU_TOKEN environment variable in OpenClaw config for easy authentication, and use batch processing to optimize API quota when handling multiple files.
Edit PDFs with natural-language instructions using the nano-pdf CLI.
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.
Generate hand-drawn style diagrams, flowcharts, and architecture diagrams as PNG images from Excalidraw JSON
The awesome PPT format generation tool provided by baidu.
AI-powered PDF generator for legal docs, pitch decks, and reports. SAFEs, NDAs, term sheets, whitepapers. npx ai-pdf-builder. Works with Claude, Cursor, GPT, Copilot.