upstage-document-parseParse documents (PDF, images, DOCX, PPTX, XLSX, HWP) using Upstage Document Parse API. Extracts text, tables, figures, and layout elements with bounding boxe...
Install via ClawdBot CLI:
clawdbot install upstage-deployment/upstage-document-parseExtract structured content from documents using Upstage's Document Parse API.
PDF (up to 1000 pages with async), PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP, DOCX, PPTX, XLSX, HWP
clawhub install upstage-document-parse
openclaw config set skills.entries.upstage-document-parse.apiKey "your-api-key"
Or add to ~/.openclaw/openclaw.json:
{
"skills": {
"entries": {
"upstage-document-parse": {
"apiKey": "your-api-key"
}
}
}
}
Just ask the agent to parse your document:
"Parse this PDF: ~/Documents/report.pdf"
"Parse: ~/Documents/report.jpg"
For small documents (recommended < 20 pages).
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| model | string | required | Use document-parse (latest) or document-parse-nightly |
| document | file | required | Document file to parse |
| mode | string | standard | standard (text-focused), enhanced (complex tables/images), auto |
| ocr | string | auto | auto (images only) or force (always OCR) |
| output_formats | string | ['html'] | text, html, markdown (array format) |
| coordinates | boolean | true | Include bounding box coordinates |
| base64_encoding | string | [] | Elements to base64: ["table"], ["figure"], etc. |
| chart_recognition | boolean | true | Convert charts to tables (Beta) |
| merge_multipage_tables | boolean | false | Merge tables across pages (Beta, max 20 pages if true) |
curl -X POST "https://api.upstage.ai/v1/document-digitization" \
-H "Authorization: Bearer $UPSTAGE_API_KEY" \
-F "document=@/path/to/file.pdf" \
-F "model=document-parse"
curl -X POST "https://api.upstage.ai/v1/document-digitization" \
-H "Authorization: Bearer $UPSTAGE_API_KEY" \
-F "document=@report.pdf" \
-F "model=document-parse" \
-F "output_formats=['markdown']"
curl -X POST "https://api.upstage.ai/v1/document-digitization" \
-H "Authorization: Bearer $UPSTAGE_API_KEY" \
-F "document=@complex.pdf" \
-F "model=document-parse" \
-F "mode=enhanced" \
-F "output_formats=['html', 'markdown']"
curl -X POST "https://api.upstage.ai/v1/document-digitization" \
-H "Authorization: Bearer $UPSTAGE_API_KEY" \
-F "document=@scan.pdf" \
-F "model=document-parse" \
-F "ocr=force"
curl -X POST "https://api.upstage.ai/v1/document-digitization" \
-H "Authorization: Bearer $UPSTAGE_API_KEY" \
-F "document=@invoice.pdf" \
-F "model=document-parse" \
-F "base64_encoding=['table']"
{
"api": "2.0",
"model": "document-parse-251217",
"content": {
"html": "<h1>...</h1>",
"markdown": "# ...",
"text": "..."
},
"elements": [
{
"id": 0,
"category": "heading1",
"content": { "html": "...", "markdown": "...", "text": "..." },
"page": 1,
"coordinates": [{"x": 0.06, "y": 0.05}, ...]
}
],
"usage": { "pages": 1 }
}
paragraph, heading1, heading2, heading3, list, table, figure, chart, equation, caption, header, footer, index, footnote
For documents up to 1000 pages. Documents are processed in batches of 10 pages.
curl -X POST "https://api.upstage.ai/v1/document-digitization/async" \
-H "Authorization: Bearer $UPSTAGE_API_KEY" \
-F "document=@large.pdf" \
-F "model=document-parse" \
-F "output_formats=['markdown']"
Response:
{"request_id": "uuid-here"}
curl "https://api.upstage.ai/v1/document-digitization/requests/{request_id}" \
-H "Authorization: Bearer $UPSTAGE_API_KEY"
Response includes download_url for each batch (available for 30 days).
curl "https://api.upstage.ai/v1/document-digitization/requests" \
-H "Authorization: Bearer $UPSTAGE_API_KEY"
submitted: Request receivedstarted: Processing in progresscompleted: Ready for downloadfailed: Error occurred (check failure_message)import requests
api_key = "up_xxx"
# Sync
with open("doc.pdf", "rb") as f:
response = requests.post(
"https://api.upstage.ai/v1/document-digitization",
headers={"Authorization": f"Bearer {api_key}"},
files={"document": f},
data={"model": "document-parse", "output_formats": "['markdown']"}
)
print(response.json()["content"]["markdown"])
# Async for large docs
with open("large.pdf", "rb") as f:
r = requests.post(
"https://api.upstage.ai/v1/document-digitization/async",
headers={"Authorization": f"Bearer {api_key}"},
files={"document": f},
data={"model": "document-parse"}
)
request_id = r.json()["request_id"]
# Poll for results
import time
while True:
status = requests.get(
f"https://api.upstage.ai/v1/document-digitization/requests/{request_id}",
headers={"Authorization": f"Bearer {api_key}"}
).json()
if status["status"] == "completed":
break
time.sleep(5)
from langchain_upstage import UpstageDocumentParseLoader
loader = UpstageDocumentParseLoader(
file_path="document.pdf",
output_format="markdown",
ocr="auto"
)
docs = loader.load()
You can also set the API key as an environment variable:
export UPSTAGE_API_KEY="your-api-key"
mode=enhanced for complex tables, charts, imagesmode=auto to let API decide per pageocr=force for scanned PDFs or imagesmerge_multipage_tables=true combines split tables (max 20 pages with enhanced mode)Generated Mar 1, 2026
Law firms can use this skill to parse contracts, legal briefs, and court documents to extract key clauses, terms, and structured data. The bounding box coordinates help identify specific sections for legal review, while markdown conversion enables easy integration into case management systems.
Researchers and academic institutions can parse PDF research papers to extract text, tables, figures, and equations into structured formats. This enables automated literature reviews, data extraction for meta-analyses, and conversion of papers into accessible HTML for online repositories.
Financial analysts can process quarterly reports, balance sheets, and financial statements to extract tables and numerical data. The enhanced mode handles complex financial tables, while base64 encoding preserves table images for audit trails and compliance documentation.
Healthcare providers can parse scanned medical records, lab reports, and patient forms to extract structured data. OCR force mode ensures accurate text extraction from handwritten or scanned documents, while async processing handles multi-page patient records efficiently.
Companies can automate accounts payable by parsing vendor invoices in PDF or image formats to extract line items, totals, and vendor information. The table extraction with coordinates helps identify specific data fields for integration with accounting software.
Offer document parsing as a cloud API service with pay-per-use pricing based on pages processed. Target developers and businesses needing to integrate document parsing into their applications without maintaining infrastructure. Provide tiered plans with different processing limits and support levels.
Sell customized document parsing solutions to large enterprises with specific industry needs (legal, healthcare, finance). Include on-premise deployment options, dedicated support, and custom model training for specialized document types. Bundle with existing enterprise software platforms.
Provide white-label document parsing infrastructure to other SaaS companies who want to offer document processing features without building their own. Include branding customization, API management, and analytics dashboards. Target companies in compliance, education, and content management sectors.
๐ฌ Integration Tip
Set UPSTAGE_API_KEY as an environment variable before use, and ensure curl is installed for command-line operations. For large documents, use async API with proper request status monitoring.
Edit PDFs with natural-language instructions using the nano-pdf CLI.
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.
็จ MinerU API ่งฃๆ PDF/Word/PPT/ๅพ็ไธบ Markdown๏ผๆฏๆๅ ฌๅผใ่กจๆ ผใOCRใ้็จไบ่ฎบๆ่งฃๆใๆๆกฃๆๅใ
Generate hand-drawn style diagrams, flowcharts, and architecture diagrams as PNG images from Excalidraw JSON
The awesome PPT format generation tool provided by baidu.