docstrangeDocument extraction API by Nanonets. Convert PDFs and images to markdown, JSON, or CSV with confidence scoring. Use when you need to OCR documents, extract invoice fields, parse receipts, or convert tables to structured data.
Install via ClawdBot CLI:
clawdbot install shhdwi/docstrangeDocument extraction API โ convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring.
Get your API key: https://docstrange.nanonets.com/app
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@document.pdf" \
-F "output_format=markdown"
Response:
{
"success": true,
"record_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"result": {
"markdown": {
"content": "# Invoice\n\n**Invoice Number:** INV-2024-001..."
}
}
}
# Visit the dashboard
https://docstrange.nanonets.com/app
Save your API key:
export DOCSTRANGE_API_KEY="your_api_key_here"
Recommended: Use environment variables (most secure):
{
skills: {
entries: {
"docstrange": {
enabled: true,
// API key loaded from environment variable DOCSTRANGE_API_KEY
},
},
},
}
Alternative: Store in config file (use with caution):
{
skills: {
entries: {
"docstrange": {
enabled: true,
env: {
DOCSTRANGE_API_KEY: "your_api_key_here",
},
},
},
},
}
Security Note: If storing API keys in ~/.openclaw/openclaw.json:
chmod 600 ~/.openclaw/openclaw.jsoncurl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@document.pdf" \
-F "output_format=markdown"
Access content: response["result"]["markdown"]["content"]
Simple field list:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@invoice.pdf" \
-F "output_format=json" \
-F 'json_options=["invoice_number", "date", "total_amount", "vendor"]' \
-F "include_metadata=confidence_score"
With JSON schema:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@invoice.pdf" \
-F "output_format=json" \
-F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}'
Response with confidence scores:
{
"result": {
"json": {
"content": {
"invoice_number": "INV-2024-001",
"total_amount": 500.00
},
"metadata": {
"confidence_score": {
"invoice_number": 98,
"total_amount": 99
}
}
}
}
}
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@table.pdf" \
-F "output_format=csv" \
-F "csv_options=table"
For documents >5 pages, use async and poll:
Queue the document:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/async" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@large-document.pdf" \
-F "output_format=markdown"
# Returns: {"record_id": "12345", "status": "processing"}
Poll for results:
curl -X GET "https://extraction-api.nanonets.com/api/v1/extract/results/12345" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY"
# Returns: {"status": "completed", "result": {...}}
Get element coordinates for layout analysis:
-F "include_metadata=bounding_boxes"
Extract document structure (sections, tables, key-value pairs):
-F "json_options=hierarchy_output"
Enhanced table and number formatting:
-F "markdown_options=financial-docs"
Guide extraction with prompts:
-F "custom_instructions=Focus on financial data. Ignore headers."
-F "prompt_mode=append"
Request multiple formats in one call:
-F "output_format=markdown,json"
| Document Size | Endpoint | Notes |
|---------------|----------|-------|
| <=5 pages | /extract/sync | Immediate response |
| >5 pages | /extract/async | Poll for results |
JSON Extraction:
["field1", "field2"] โ quick extractions{"type": "object", ...} โ strict typing, nested dataConfidence Scores:
include_metadata=confidence_score{
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string"},
"vendor": {"type": "string"},
"total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}
{
"type": "object",
"properties": {
"merchant": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"},
"items": {
"type": "array",
"items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}}
}
}
}
Important: Documents uploaded to DocStrange are transmitted to https://extraction-api.nanonets.com and processed on external servers.
Before uploading sensitive documents:
Best practices:
file_url with publicly accessible URLs instead of uploading large files directly"your_api_key_here" in examples400 Bad Request:
file, file_url, or file_base64Sync Timeout:
/extract/results/{record_id}Missing Confidence Scores:
json_options (field list or schema)include_metadata=confidence_scoreAuthentication Errors:
DOCSTRANGE_API_KEY environment variable is setBefore publishing or updating this skill, verify:
package.json declares requiredEnv and primaryEnv for DOCSTRANGE_API_KEYpackage.json lists API endpoints in endpoints array"your_api_key_here") not real keysSKILL.md or package.jsonGenerated Mar 1, 2026
Extract key fields like invoice number, date, vendor, and total amount from PDF invoices to automate accounts payable workflows. Use JSON output with confidence scoring to flag low-confidence entries for manual review, reducing data entry errors and speeding up processing.
Convert scanned receipts into structured data (e.g., CSV or JSON) to track expenses, categorize spending, and integrate with accounting software. This enables real-time expense reporting and compliance auditing for businesses.
Parse legal documents to extract clauses, dates, and parties into markdown or JSON for review and summarization. This aids in due diligence, contract management, and identifying key terms without manual reading.
Extract transaction details, balances, and dates from bank statements to analyze cash flow, detect anomalies, and generate reports. Use async extraction for large documents to handle multi-page statements efficiently.
OCR patient intake forms and medical records to convert them into structured data (e.g., JSON with custom schemas) for electronic health record systems. This improves data accuracy and accessibility while maintaining confidentiality.
Offer tiered pricing based on usage volume (e.g., number of documents processed per month) with features like advanced extraction modes and priority support. Target small to medium businesses needing scalable OCR solutions.
Charge per document or API call, appealing to developers and enterprises with variable workloads. Include add-ons for custom instructions or high-volume async processing to upsell power users.
Provide custom integrations, dedicated support, and on-premise deployment options for large organizations in regulated industries like finance or healthcare. Focus on security, compliance, and high-throughput needs.
๐ฌ Integration Tip
Use environment variables for API keys to enhance security, and start with sync extraction for small documents before scaling to async for larger files.
Edit PDFs with natural-language instructions using the nano-pdf CLI.
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.
็จ MinerU API ่งฃๆ PDF/Word/PPT/ๅพ็ไธบ Markdown๏ผๆฏๆๅ ฌๅผใ่กจๆ ผใOCRใ้็จไบ่ฎบๆ่งฃๆใๆๆกฃๆๅใ
Generate hand-drawn style diagrams, flowcharts, and architecture diagrams as PNG images from Excalidraw JSON
The awesome PPT format generation tool provided by baidu.