literature-managerSearch, download, convert, organize, and audit academic literature collections. Use when asked to find papers, build a literature library, add papers to refe...
Install via ClawdBot CLI:
clawdbot install IsonaEi/literature-managerManage academic literature collections: search → download → convert → organize → verify.
pdftotext (poppler-utils) — PDF text extractioncurl — downloadingpython3 — JSON processing in auditfile (coreutils) — PDF validationuvx markitdown[pdf] (optional) — fallback PDF→MD converter (note: plain uvx markitdown does NOT work for PDFs — must use uvx markitdown[pdf])# Download a single paper by DOI
bash scripts/download.sh "10.1038/s41592-024-02200-1" output_dir/
# Convert PDF to markdown
bash scripts/convert.sh paper.pdf output.md
# Verify a single PDF+MD pair
bash scripts/verify.sh paper.pdf paper.md
# Full audit of a references/ folder
bash scripts/audit.sh /path/to/references/
Use web_fetch on Google Scholar:
https://scholar.google.com/scholar?q=QUERY&as_ylo=YEAR
Extract: title, authors, year, journal, DOI, PDF links.
For each result, identify the best open-access PDF source (see Download Strategy).
Run scripts/download.sh per paper. The script tries sources in order:
PMC_ID → PDF)https://sci-hub.box/ (use when publisher is paywalled)# Sci-Hub download example:
curl -L "https://sci-hub.box/10.1038/nature12345" -o paper.pdf
⚠️ Legal note: Sci-Hub may violate publisher terms of service or copyright law in some jurisdictions. Use only if you understand and accept the legal implications in your context.
If all sources fail (including Sci-Hub), flag as permanent paywall. Provide the user with the DOI and ask for manual download.
Run scripts/convert.sh . Uses pdftotext (reliable) with uvx markitdown[pdf] as fallback.
# Correct markitdown command for PDFs:
uvx markitdown[pdf] input.pdf > output.md
# ⚠️ The following will NOT work for PDFs (missing [pdf] extra):
# uvx markitdown input.pdf
Prefer uvx markitdown[pdf] over pdftotext when full fidelity (tables, figures captions) matters.
Standard folder structure:
references/
├── README.md # Human index (summaries per category)
├── index.json # Machine index (structured metadata)
├── RESOURCES.md # Code repos + datasets
├── resources.json # Structured version
├── <category-1>/
│ ├── papers/ # PDFs
│ └── markdown/ # Converted text
└── <category-N>/
├── papers/
└── markdown/
Categories are user-defined. Number-prefix for sort order (e.g., 01-theoretical-frameworks/).
{
"id": "short_id",
"title": "Full title",
"authors": ["Author1", "Author2"],
"year": 2024,
"journal": "Journal Name",
"doi": "10.xxxx/...",
"category": "category_name",
"subcategory": "optional",
"pdf_path": "category/papers/filename.pdf",
"markdown_path": "category/markdown/filename.md",
"tags": ["tag1", "tag2"],
"one_line_summary": "English one-liner",
"key_concepts": ["concept1"],
"relevance_to_project": "English description"
}
Per category section, per paper: title, authors, year, journal, DOI, short summary in user's language.
Downloaded files are often named using DOI format rather than AuthorYear:
10-1038_ncomms3018.md # DOI: 10.1038/ncomms3018
10-1016_j-neuron-2015-03-034.md
When markdown_path entries in index.json become stale (e.g., after folder reorganization), maintain a separate mapping file:
// temp/paper_md_mapping.json
{
"author2024_keyword": "references/new-downloads/10-1038_s41592-024-02200-1.md",
...
}
To build this mapping: cross-reference each paper's DOI in index.json against actual files on disk. Use find + Python to automate.
id: null corruption: If many entries have id=null and share the same pdf_path, the index was likely corrupted during a batch write. Rebuild from actual files on disk.markdown_path: After restructuring folders, markdown_path in index.json often points to old locations. Use the mapping file above as the source of truth.Run scripts/audit.sh for full verification:
file -b = PDF)pdftotext | head)For tool/method papers, find GitHub repos and public datasets. Store in RESOURCES.md + resources.json.
For large batches, parallelize:
Always use a separate sub-agent for verification (QC should not self-grade).
1. Spawn agent(s)
2. Immediately set a cron job (every 10-15 min, isolated agentTurn)
→ Check if expected output files exist
→ Re-spawn failed agents
→ When all complete: announce + delete cron
3. After task finishes, confirm cron was removed
To add papers to an existing collection:
Generated Mar 1, 2026
Researchers can use this skill to systematically gather, convert, and organize papers for a literature review. It automates downloading from sources like arXiv and Sci-Hub, converts PDFs to markdown for easier analysis, and structures references by topic, saving time in academic projects.
A biotech startup can employ this skill to collect and audit scientific papers on competitors' technologies. It helps download and convert relevant studies, organize them by therapeutic area, and extract code or dataset links for tool validation, supporting R&D strategy.
Companies in tech or engineering sectors can use this skill to build an internal literature library. It automates fetching papers on emerging technologies, converts them to searchable markdown, and audits the collection to ensure completeness, aiding innovation teams.
Publishing houses can utilize this skill to gather and organize academic content for editors. It searches for papers on specific topics, downloads PDFs from publishers, converts them for editorial review, and verifies metadata, streamlining content acquisition processes.
Government agencies can apply this skill to compile and audit scientific literature for policy-making. It downloads reports and studies, converts them to accessible formats, organizes by policy area, and verifies sources, ensuring evidence-based decision support.
Offer a cloud-based service where research teams pay a monthly subscription to access automated literature management. It provides scalable search, download, and conversion tools with audit features, targeting academic labs and corporate R&D departments.
Provide consulting services to organizations needing to audit or organize their reference collections. Use the skill's verification and organization capabilities to deliver structured reports and clean datasets, charging per project or hourly rates.
Develop a freemium desktop application based on this skill, offering basic search and download for free. Charge for advanced features like batch processing, priority support, and integration with reference managers, appealing to graduate students and independent researchers.
💬 Integration Tip
Integrate this skill with existing reference managers like Zotero by exporting index.json data, and use cron jobs to automate periodic audits for large collections.
Manage Trello boards, lists, and cards via the Trello REST API.
Sync and query CalDAV calendars (iCloud, Google, Fastmail, Nextcloud, etc.) using vdirsyncer + khal. Works on Linux.
Manage tasks and projects in Todoist. Use when user asks about tasks, to-dos, reminders, or productivity.
Master OpenClaw's timing systems. Use for scheduling reliable reminders, setting up periodic maintenance (janitor jobs), and understanding when to use Cron v...
Calendar management and scheduling. Create events, manage meetings, and sync across calendar providers.
Kanban-style task management dashboard for AI assistants. Manage tasks via CLI or dashboard UI. Use when user mentions tasks, kanban, task board, mission con...