deepread-ocrAI-native OCR platform that turns documents into high-accuracy data in minutes. Using multi-model consensus, DeepRead achieves 97%+ accuracy and flags only uncertain fields for Human-in-the-Loop (HIL) review—reducing manual work from 100% to 5-10%. Zero prompt engineering required.
Install via ClawdBot CLI:
clawdbot install DeepRead001/deepread-ocrDeepRead is an AI-native OCR platform that turns documents into high-accuracy data in minutes. Using multi-model consensus, DeepRead achieves 97%+ accuracy and flags only uncertain fields for Human-in-the-Loop (HIL) review—reducing manual work from 100% to 5-10%. Zero prompt engineering required.
DeepRead is a production-grade document processing API that gives you high-accuracy structured data output in minutes with human review flagging so manual review is limited to the flagged exceptions
Core Features:
hil_flag) so only exceptions need manual reviewSign up and create an API key:
# Visit the dashboard
https://www.deepread.tech/dashboard
# Or use this direct link
https://www.deepread.tech/dashboard/?utm_source=clawdhub
Save your API key:
export DEEPREAD_API_KEY="sk_live_your_key_here"
Add to your clawdbot.config.json5:
{
skills: {
entries: {
"deepread": {
enabled: true
// API key is read from DEEPREAD_API_KEY environment variable
// Do NOT hardcode your API key here
}
}
}
}
Option A: With Webhook (Recommended)
# Upload PDF with webhook notification
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@document.pdf" \
-F "webhook_url=https://your-app.com/webhooks/deepread"
# Returns immediately
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued"
}
# Your webhook receives results when processing completes (2-5 minutes)
Option B: Poll for Results
# Upload PDF without webhook
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@document.pdf"
# Returns immediately
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued"
}
# Poll until completed
curl https://api.deepread.tech/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
-H "X-API-Key: $DEEPREAD_API_KEY"
Extract text as clean markdown:
# With webhook (recommended)
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F "webhook_url=https://your-app.com/webhook"
# OR poll for completion
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf"
# Then poll
curl https://api.deepread.tech/v1/jobs/JOB_ID \
-H "X-API-Key: $DEEPREAD_API_KEY"
Response when completed:
{
"id": "550e8400-...",
"status": "completed",
"result": {
"text": "# INVOICE\n\n**Vendor:** Acme Corp\n**Total:** $1,250.00..."
}
}
Extract specific fields with confidence scoring:
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F 'schema={
"type": "object",
"properties": {
"vendor": {
"type": "string",
"description": "Vendor company name"
},
"total": {
"type": "number",
"description": "Total invoice amount"
},
"invoice_date": {
"type": "string",
"description": "Invoice date in MM/DD/YYYY format"
}
}
}'
Response includes confidence flags:
{
"status": "completed",
"result": {
"text": "# INVOICE\n\n**Vendor:** Acme Corp...",
"data": {
"vendor": {
"value": "Acme Corp",
"hil_flag": false,
"found_on_page": 1
},
"total": {
"value": 1250.00,
"hil_flag": false,
"found_on_page": 1
},
"invoice_date": {
"value": "2024-10-??",
"hil_flag": true,
"reason": "Date partially obscured",
"found_on_page": 1
}
},
"metadata": {
"fields_requiring_review": 1,
"total_fields": 3,
"review_percentage": 33.3
}
}
}
Extract arrays and nested objects:
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F 'schema={
"type": "object",
"properties": {
"vendor": {"type": "string"},
"total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}'
Get per-page OCR results with quality flags:
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@contract.pdf" \
-F "include_pages=true"
Response:
{
"result": {
"text": "Combined text from all pages...",
"pages": [
{
"page_number": 1,
"text": "# Contract Agreement\n\n...",
"hil_flag": false
},
{
"page_number": 2,
"text": "Terms and C??diti??s...",
"hil_flag": true,
"reason": "Multiple unrecognized characters"
}
],
"metadata": {
"pages_requiring_review": 1,
"total_pages": 2
}
}
}
PDF → Convert → Rotate Correction → OCR → Multi-Model Validation → Extract → Done
The pipeline automatically handles:
DeepRead includes a built-in Human-in-the-Loop (HIL) review system. The AI compares extracted text to the original image and sets hil_flag on each field:
hil_flag: false = Clear, confident extraction → Auto-processhil_flag: true = Uncertain extraction → Routed to human reviewHow HIL works:
hil_flag: true and a reasonpreview.deepread.tech) — a dedicated HIL review interface where reviewers can see the original document side-by-side with extracted data, correct flagged fields, and approve resultshil_flag data in the API responseAI flags extractions when:
This is multimodal AI determination, not rule-based.
Create reusable, optimized schemas for specific document types:
# List your blueprints
curl https://api.deepread.tech/v1/blueprints \
-H "X-API-Key: $DEEPREAD_API_KEY"
# Use blueprint instead of inline schema
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F "blueprint_id=660e8400-e29b-41d4-a716-446655440001"
Benefits:
How to create blueprints:
# Create a blueprint from training data
curl -X POST https://api.deepread.tech/v1/optimize \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "utility_invoice",
"description": "Optimized for utility invoices",
"document_type": "invoice",
"initial_schema": {
"type": "object",
"properties": {
"vendor": {"type": "string", "description": "Vendor name"},
"total": {"type": "number", "description": "Total amount"}
}
},
"training_documents": ["doc1.pdf", "doc2.pdf", "doc3.pdf"],
"ground_truth_data": [
{"vendor": "Acme Power", "total": 125.50},
{"vendor": "City Electric", "total": 89.25}
],
"target_accuracy": 95.0,
"max_iterations": 5
}'
# Returns: {"job_id": "...", "blueprint_id": "...", "status": "pending"}
# Check optimization status
curl https://api.deepread.tech/v1/blueprints/jobs/JOB_ID \
-H "X-API-Key: $DEEPREAD_API_KEY"
# Use blueprint (once completed)
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F "blueprint_id=BLUEPRINT_ID"
Get notified when processing completes instead of polling:
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@invoice.pdf" \
-F "webhook_url=https://your-app.com/webhooks/deepread"
Your webhook receives this payload when processing completes:
{
"job_id": "550e8400-...",
"status": "completed",
"created_at": "2025-01-27T10:00:00Z",
"completed_at": "2025-01-27T10:02:30Z",
"result": {
"text": "...",
"data": {...}
},
"preview_url": "https://preview.deepread.tech/abc1234"
}
Benefits:
DeepRead Preview (preview.deepread.tech) is the built-in Human-in-the-Loop review interface. Reviewers can view the original document alongside extracted data, correct flagged fields, and approve results. Preview URLs can also be shared without authentication:
# Request preview URL
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@document.pdf" \
-F "include_images=true"
# Get preview URL in response
{
"result": {
"text": "...",
"data": {...}
},
"preview_url": "https://preview.deepread.tech/Xy9aB12"
}
Public Preview Endpoint:
# No authentication required
curl https://api.deepread.tech/v1/preview/Xy9aB12
Upgrade: https://www.deepread.tech/dashboard/billing?utm_source=clawdhub
Every response includes quota information:
X-RateLimit-Limit: 2000
X-RateLimit-Remaining: 1847
X-RateLimit-Used: 153
X-RateLimit-Reset: 1730419200
✅ Recommended: Webhook notifications
curl -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "file=@document.pdf" \
-F "webhook_url=https://your-app.com/webhook"
Only use polling if:
✅ Good: Descriptive field descriptions
{
"vendor": {
"type": "string",
"description": "Vendor company name. Usually in header or top-left of invoice."
}
}
❌ Bad: No description
{
"vendor": {"type": "string"}
}
Only if you can't use webhooks, poll every 5-10 seconds:
import time
import requests
def wait_for_result(job_id, api_key):
while True:
response = requests.get(
f"https://api.deepread.tech/v1/jobs/{job_id}",
headers={"X-API-Key": api_key}
)
result = response.json()
if result["status"] == "completed":
return result["result"]
elif result["status"] == "failed":
raise Exception(f"Job failed: {result.get('error')}")
time.sleep(5)
Separate confident fields from uncertain ones:
def process_extraction(data):
confident = {}
needs_review = []
for field, field_data in data.items():
if field_data["hil_flag"]:
needs_review.append({
"field": field,
"value": field_data["value"],
"reason": field_data.get("reason")
})
else:
confident[field] = field_data["value"]
# Auto-process confident fields
save_to_database(confident)
# Send uncertain fields to review queue
if needs_review:
send_to_review_queue(needs_review)
quota_exceeded{"detail": "Monthly page quota exceeded"}
Solution: Upgrade to PRO or wait until next billing cycle.
invalid_schema{"detail": "Schema must be valid JSON Schema"}
Solution: Ensure schema is valid JSON and includes type and properties.
file_too_large{"detail": "File size exceeds 50MB limit"}
Solution: Compress PDF or split into smaller files.
failed{"status": "failed", "error": "PDF could not be processed"}
Common causes:
{
"type": "object",
"properties": {
"invoice_number": {
"type": "string",
"description": "Unique invoice ID"
},
"invoice_date": {
"type": "string",
"description": "Invoice date in MM/DD/YYYY format"
},
"vendor": {
"type": "string",
"description": "Vendor company name"
},
"total": {
"type": "number",
"description": "Total amount due including tax"
},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}
{
"type": "object",
"properties": {
"merchant": {
"type": "string",
"description": "Store or merchant name"
},
"date": {
"type": "string",
"description": "Transaction date"
},
"total": {
"type": "number",
"description": "Total amount paid"
},
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"}
}
}
}
}
}
{
"type": "object",
"properties": {
"parties": {
"type": "array",
"items": {"type": "string"},
"description": "Names of all parties in the contract"
},
"effective_date": {
"type": "string",
"description": "Contract start date"
},
"term_length": {
"type": "string",
"description": "Duration of contract"
},
"termination_clause": {
"type": "string",
"description": "Conditions for termination"
}
}
}
Ready to start? Get your free API key at https://www.deepread.tech/dashboard/?utm_source=clawdhub
Generated Mar 1, 2026
Automates extraction of vendor details, invoice dates, and totals from PDF invoices, flagging uncertain fields for manual review. Reduces manual data entry by 90% and integrates with accounting software via webhooks for real-time processing.
Extracts key clauses, dates, and parties from legal documents with high accuracy, using structured schemas to output JSON data. Flags ambiguous terms for lawyer review, speeding up due diligence and compliance checks.
Converts scanned patient forms and prescriptions into structured data, extracting fields like patient names, dates, and medications. Flags low-confidence entries for healthcare staff review, ensuring accuracy in electronic health records.
Processes receipts from images or PDFs to extract merchant names, amounts, and dates, integrating with expense software. Reduces manual review by flagging only unclear items, streamlining reimbursement workflows for employees.
Extracts titles, authors, and abstracts from research PDFs for library databases or citation tools. Uses multi-model consensus to handle varied formats, flagging uncertain data for librarian verification to maintain quality.
Offers a free tier with 2,000 pages per month to attract small businesses and developers, then charges based on usage volume or advanced features like custom schemas. Generates revenue from enterprise plans with higher limits and priority support.
Monetizes through API calls, with pricing tiers based on monthly page volume or processing speed. Targets developers and enterprises by providing scalable OCR without infrastructure costs, with add-ons for HIL review interfaces.
Licenses the OCR technology to other software companies for integration into their products, such as document management systems or workflow tools. Revenue comes from licensing fees and custom development services for specific use cases.
💬 Integration Tip
Use webhooks for asynchronous processing to avoid blocking your application, and start with the free tier to test accuracy before scaling.
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Search and analyze your own session logs (older/parent conversations) using jq.
Typed knowledge graph for structured agent memory and composable skills. Use when creating/querying entities (Person, Project, Task, Event, Document), linking related objects, enforcing constraints, planning multi-step actions as graph transformations, or when skills need to share state. Trigger on "remember", "what do I know about", "link X to Y", "show dependencies", entity CRUD, or cross-skill data access.
Ultimate AI agent memory system for Cursor, Claude, ChatGPT & Copilot. WAL protocol + vector search + git-notes + cloud backup. Never lose context again. Vibe-coding ready.
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection