Nano PDF Skill: Edit Any PDF Page with Plain English Using Gemini AI
14,591+ downloads and 37 stars on ClawHub. The nano-pdf skill by @steipete (Peter Steinberger, OpenClaw's founder) wraps the nano-pdf CLI — a tool that lets you edit any PDF page with a plain English instruction. Describe the change you want; Gemini handles the layout interpretation and rendering. No PDF library knowledge, no coordinate systems, no structure manipulation required.
The Problem It Solves
PDFs are notoriously painful to edit programmatically. PDF structure is complex — fonts, coordinates, content streams, cross-references. Tools like PyPDF2, pdfminer, or reportlab require deep domain knowledge. And most "PDF editors" are GUI applications that can't be scripted.
The nano-pdf approach is different: instead of manipulating PDF internals, it converts the target page to an image, sends it to Gemini with your instruction, gets back an edited image, and re-integrates it into the PDF with an OCR text layer. You describe what you want in English; the AI handles the rest.
How It Works (Under the Hood)
PDF → page image → Gemini 3 Pro Image + your instruction → edited image → PDF
↑ OCR re-hydrates text layer
- Target page is rendered as an image
- Image + instruction sent to Gemini 3 Pro Image (internally nicknamed "Nano Banana")
- Gemini returns an edited image
- Image is re-embedded into the PDF
- OCR runs on the new image to restore a searchable text layer
The result is a PDF where the target page reflects your changes. Other pages are untouched.
Installation
# Install via pip
pip install nano-pdf
# Or via ClawHub
clawhub install nano-pdf
# Via uv (recommended for isolated environments)
uv tool install nano-pdfRequires a Gemini API key (paid tier — Gemini 3 Pro Image is not available on the free tier).
Core Commands
Edit an Existing Page
nano-pdf edit deck.pdf 1 "Change the title to 'Q3 Results' and fix the typo in the subtitle"Parameters:
deck.pdf— source file1— page number (note: may be 0-based or 1-based depending on version — if results are off by one, retry with the other)- Instruction string — plain English description of the change
More examples:
# Fix a typo
nano-pdf edit report.pdf 3 "Fix the typo 'recieve' to 'receive' in the second paragraph"
# Update a number
nano-pdf edit slides.pdf 7 "Update the revenue figure to $2.5M"
# Change branding
nano-pdf edit deck.pdf 1 "Replace 'Acme Corp' with 'Globex Inc' throughout the slide"
# Visual layout change
nano-pdf edit proposal.pdf 5 "Move the chart to the right side and add a title 'Growth Trend'"Add a New Slide
nano-pdf add deck.pdf 15 "Create an executive summary slide with 5 bullet points summarizing the key findings"The model analyzes the existing deck's visual style and generates a new slide that matches the aesthetic — fonts, colors, layout structure.
Practical Use Cases
Last-minute slide fixes: You're about to present and notice a typo on slide 3. Instead of reopening PowerPoint, finding the element, fixing it, and re-exporting:
nano-pdf edit presentation.pdf 3 "Fix the typo 'implimentation' to 'implementation'"Automated report updates: A cron job pulls the latest metrics and updates the PDF with current numbers before it's distributed:
# Update the key metrics slide with fresh data
nano-pdf edit monthly-report.pdf 2 "Update the DAU figure to 124,500 and MRR to $890K"CI/CD document pipeline: Before a scheduled report sends, inject the latest data:
# Shell script as part of a pipeline
METRICS=$(./fetch-metrics.sh)
nano-pdf edit report-template.pdf 1 "Update all metrics: ${METRICS}"Style-consistent new slides: Add a slide that matches the existing deck's design without opening a graphics tool:
nano-pdf add investor-deck.pdf 12 "Add a 'Team' slide with three columns: Engineering, Product, and Design, with placeholder names"Known Trade-offs (From the HN Discussion)
The Hacker News thread about nano-pdf surfaced honest trade-offs worth knowing:
- File size increases: Pages converted to images are larger than their original PDF content stream equivalents. Decks with many edited pages can grow significantly.
- Lossy round-trip: The image → OCR path loses text bounding box metadata. This can affect accessibility tools that rely on precise text positioning.
- Gemini API cost: Gemini 3 Pro Image is a paid model. High-volume editing (dozens of pages) will accumulate API costs.
- Page numbering ambiguity: The CLI's page index may be 0-based or 1-based depending on the version. Always verify with a test edit first.
Considerations
- Requires Gemini API key — not free. Budget accordingly for batch operations.
- Output quality depends on input quality: Blurry or low-resolution source PDFs produce lower-quality edits.
- Complex instructions work better than simple ones: The model handles layout context well. "Move the chart to the right and add a legend" works better than trying to specify pixel coordinates.
- Not suitable for form fields or interactive PDFs: The image-based approach strips interactivity. If your PDF has form fields, they'll be flattened in edited pages.
- Sanity-check outputs: The SKILL.md explicitly says "Always sanity-check the output PDF before sending it out." The AI may interpret ambiguous instructions differently than intended.
Comparison: nano-pdf vs. Traditional PDF Editing
| Approach | Knowledge Required | Scripting | Quality |
|---|---|---|---|
| nano-pdf | None (plain English) | ✅ Full CLI | AI-generated image |
| PyPDF2 / pdfminer | Deep PDF internals | ✅ Python | Text-layer only |
| reportlab | PDF generation API | ✅ Python | High (programmatic) |
| Adobe Acrobat | GUI familiarity | ❌ Manual | High |
| LibreOffice CLI | Medium | ⚠️ Limited | Medium |
nano-pdf occupies a unique niche: fully scriptable, no domain knowledge required, but AI-quality output (good but not pixel-perfect).
The Bigger Picture
nano-pdf represents a new class of tool: natural language as the editing interface for binary formats. PDFs have resisted programmatic manipulation for decades because their structure is complex and fragile. The image-based approach sidesteps that complexity entirely — treat the page as a visual, let a vision model understand and modify it, then put it back.
As Gemini's image reasoning improves, so will the quality of the edits. The nano-pdf skill bets on that trajectory — and at 37 GitHub stars and 14,591 downloads, there's clear appetite for exactly this kind of tool.
View the skill on ClawHub: nano-pdf