markitdown-skillOpenClaw agent skill for converting documents to Markdown. Documentation and utilities for Microsoft's MarkItDown library. Supports PDF, Word, PowerPoint, Excel, images (OCR), audio (transcription), HTML, YouTube.
Install via ClawdBot CLI:
clawdbot install karmanverma/markitdown-skillDocumentation and utilities for converting documents to Markdown using Microsoft's MarkItDown library.
Note: This skill provides documentation and a batch script. The actual conversion is done by the markitdown CLI/library installed via pip.
Use markitdown for:
# Convert file to markdown
markitdown document.pdf -o output.md
# Convert URL
markitdown https://example.com/docs -o docs.md
| Format | Features |
|--------|----------|
| PDF | Text extraction, structure |
| Word (.docx) | Headings, lists, tables |
| PowerPoint | Slides, text |
| Excel | Tables, sheets |
| Images | OCR + EXIF metadata |
| Audio | Speech transcription |
| HTML | Structure preservation |
| YouTube | Video transcription |
The skill requires Microsoft's markitdown CLI:
pip install 'markitdown[all]'
Or install specific formats only:
pip install 'markitdown[pdf,docx,pptx]'
markitdown https://github.com/user/repo/blob/main/README.md -o readme.md
markitdown document.pdf -o document.md
# Using included script
python ~/.openclaw/skills/markitdown/scripts/batch_convert.py docs/*.pdf -o markdown/ -v
# Or shell loop
for file in docs/*.pdf; do
markitdown "$file" -o "${file%.pdf}.md"
done
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)
pip install 'markitdown[all]'
# Ubuntu/Debian
sudo apt-get install tesseract-ocr
# macOS
brew install tesseract
| Component | Source |
|-----------|--------|
| markitdown CLI | Microsoft's pip package |
| markitdown Python API | Microsoft's pip package |
| scripts/batch_convert.py | This skill (utility) |
| Documentation | This skill |
Generated Mar 1, 2026
Law firms can convert PDF contracts, Word briefs, and scanned images into Markdown for easier text analysis, search, and archiving. This streamlines case preparation and compliance documentation by making content machine-readable and portable.
Researchers and universities can batch convert PDF papers, PowerPoint presentations, and Excel datasets into Markdown to aggregate literature reviews or create accessible study materials. This supports collaborative writing and data extraction for analysis.
Marketing agencies can transform HTML web pages, YouTube video transcripts, and Word documents into Markdown for repurposing content across blogs, social media, and documentation. This enhances SEO efforts by standardizing text formats for faster updates.
Enterprises can convert internal documents like PowerPoint decks, Excel reports, and audio meeting recordings into Markdown to build a searchable knowledge base. This improves information retrieval and onboarding processes for employees.
Media companies can use OCR to extract text from images and transcribe audio files into Markdown for subtitles, scripts, or archival purposes. This automates content preparation for publishing and accessibility compliance.
Offer a cloud-based service where users upload documents via a web interface or API to receive Markdown output, with tiered pricing based on volume and format support. Revenue comes from subscription plans and pay-per-use fees for high-volume clients.
Provide customized integration of the MarkItDown skill into existing corporate systems like CMS or legal software, along with training and technical support. Revenue is generated through licensing fees, implementation projects, and annual maintenance contracts.
Develop a free desktop or CLI tool for basic conversions, with premium upgrades for batch processing, advanced OCR, and API access. Revenue streams include one-time purchases for pro versions and in-app purchases for additional formats.
๐ฌ Integration Tip
Install markitdown via pip with specific format dependencies to reduce setup size, and use the included batch script for automating conversions in workflows.
Edit PDFs with natural-language instructions using the nano-pdf CLI.
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.
็จ MinerU API ่งฃๆ PDF/Word/PPT/ๅพ็ไธบ Markdown๏ผๆฏๆๅ ฌๅผใ่กจๆ ผใOCRใ้็จไบ่ฎบๆ่งฃๆใๆๆกฃๆๅใ
Generate hand-drawn style diagrams, flowcharts, and architecture diagrams as PNG images from Excalidraw JSON
The awesome PPT format generation tool provided by baidu.