doc-to-textDoc to Text - extract plain readable text from Word (.doc/.docx) documents using MinerU. Output is Markdown (the closest plain-text format MinerU supports).
Install via ClawdBot CLI:
clawdbot install mzlzyca/doc-to-textGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://mineru.netAudited Apr 16, 2026 · audit v1.0
Generated May 11, 2026
Law firms and legal departments can extract plain text from Word documents such as contracts, briefs, and discovery materials. This enables efficient searching, indexing, and integration into document management systems or AI analysis pipelines.
Researchers and students can convert .doc/.docx papers to Markdown or extract text for literature reviews, citation management, or content repurposing. The tool supports both English and Chinese, making it suitable for multilingual academic work.
Businesses can automate the extraction of text from internal reports, proposals, and memos for data entry, compliance checks, or knowledge base building. Integration with CI/CD pipelines allows for batch processing.
Publishers and content creators can extract text from old Word files to migrate to new CMS platforms, generate ebook drafts, or repurpose content for web/mobile. Markdown output facilitates conversion to HTML or other formats.
Organizations can archive legacy .doc files by extracting readable text, preserving content in a future-proof format. This also aids accessibility by providing plain text for screen readers.
Offer free flash-extract for .docx files as a hook, while charging for .doc extraction and advanced features via token purchases. Revenue comes from token sales on the MinerU platform.
Provide a cloud service with a monthly subscription for high-volume document processing, including .doc support, JSON output, and language hints. Additional tiers for team collaboration and API access.
License the MinerU tool for enterprise on-premise use, ensuring data sovereignty and compliance. Includes priority support, custom integrations, and SLA guarantees.
💬 Integration Tip
Set MINERU_TOKEN as an environment variable for seamless authentication. For .docx files, use flash-extract to skip token setup entirely.
Scored Jun 20, 2026
Edit PDFs with natural-language instructions using the nano-pdf CLI.
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
Create, inspect, and edit Microsoft Word documents and DOCX files with reliable styles, numbering, tracked changes, tables, sections, and compatibility check...
Create, inspect, and edit Microsoft Excel workbooks and XLSX files with reliable formulas, dates, types, formatting, recalculation, and template preservation...
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.
Create, inspect, and edit Microsoft PowerPoint presentations and PPTX decks with reliable layouts, templates, placeholders, notes, charts, and visual QA. Use...