mineru-pdf-parser-clawdbot-skillParse PDFs locally (CPU) into Markdown/JSON using MinerU. Assumes MinerU creates per‑doc output folders; supports table/image extraction.
Install via ClawdBot CLI:
clawdbot install kesslerio/mineru-pdf-parser-clawdbot-skillParse a PDF locally with MinerU (CPU). Default output is Markdown + JSON. Use tables/images only when requested.
# Run from the skill directory
./scripts/mineru_parse.sh /path/to/file.pdf
Optional examples:
./scripts/mineru_parse.sh /path/to/file.pdf --format json
./scripts/mineru_parse.sh /path/to/file.pdf --tables --images
If flags differ from your wrapper or you need advanced defaults (backend/method/device/threads/format mapping), read:
references/mineru-cli.md./mineru-output/../mineru-output//... ).Default is single-PDF parsing. Only implement batch folder parsing if explicitly requested.
Generated Mar 1, 2026
Law firms can parse contracts and legal briefs into structured formats for easier review and analysis. Extracting tables and images helps in visualizing evidence and financial data.
Researchers can convert PDF research papers into Markdown or JSON for data mining and citation extraction. This aids in literature reviews and meta-analyses by enabling text analysis tools.
Financial analysts can extract tables and text from quarterly reports and statements into JSON for automated data integration into spreadsheets or databases. This streamlines financial modeling and compliance checks.
Healthcare providers can parse medical records and lab reports from PDFs into structured formats for electronic health record systems. Image extraction supports diagnostic imaging and chart analysis.
Government agencies can batch-process public records and forms into searchable Markdown or JSON formats. This improves accessibility and data retrieval for public services and audits.
Offer a cloud-based service where users upload PDFs for parsing via an API, with tiered pricing based on volume and features like table extraction. This targets businesses needing scalable document processing.
Sell licenses for local deployment of the skill, ideal for organizations with data privacy concerns or offline requirements. Includes support and updates for a one-time fee or annual maintenance.
Provide consulting services to integrate the PDF parser into existing workflows, with customization for specific industries like legal or finance. This includes training and ongoing support contracts.
💬 Integration Tip
Ensure MinerU is installed locally and output folders are configured correctly; use the provided scripts for quick testing before full integration.
Edit PDFs with natural-language instructions using the nano-pdf CLI.
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.
用 MinerU API 解析 PDF/Word/PPT/图片为 Markdown,支持公式、表格、OCR。适用于论文解析、文档提取。
Generate hand-drawn style diagrams, flowcharts, and architecture diagrams as PNG images from Excalidraw JSON
The awesome PPT format generation tool provided by baidu.