skill-spotlightdocs-officenano-pdfclawhubopenclawpdfgemini

Nano PDF Skill: Edit Any PDF Page with Plain English Using Gemini AI

March 13, 2026·6 min read

14,591+ downloads and 37 stars on ClawHub. The nano-pdf skill by @steipete (Peter Steinberger, OpenClaw's founder) wraps the nano-pdf CLI — a tool that lets you edit any PDF page with a plain English instruction. Describe the change you want; Gemini handles the layout interpretation and rendering. No PDF library knowledge, no coordinate systems, no structure manipulation required.

The Problem It Solves

PDFs are notoriously painful to edit programmatically. PDF structure is complex — fonts, coordinates, content streams, cross-references. Tools like PyPDF2, pdfminer, or reportlab require deep domain knowledge. And most "PDF editors" are GUI applications that can't be scripted.

The nano-pdf approach is different: instead of manipulating PDF internals, it converts the target page to an image, sends it to Gemini with your instruction, gets back an edited image, and re-integrates it into the PDF with an OCR text layer. You describe what you want in English; the AI handles the rest.

How It Works (Under the Hood)

PDF → page image → Gemini 3 Pro Image + your instruction → edited image → PDF
                                                              ↑ OCR re-hydrates text layer

Target page is rendered as an image
Image + instruction sent to Gemini 3 Pro Image (internally nicknamed "Nano Banana")
Gemini returns an edited image
Image is re-embedded into the PDF
OCR runs on the new image to restore a searchable text layer

The result is a PDF where the target page reflects your changes. Other pages are untouched.

Installation

# Install via pip
pip install nano-pdf
 
# Or via ClawHub
clawhub install nano-pdf
 
# Via uv (recommended for isolated environments)
uv tool install nano-pdf

Requires a Gemini API key (paid tier — Gemini 3 Pro Image is not available on the free tier).

Core Commands

Edit an Existing Page

nano-pdf edit deck.pdf 1 "Change the title to 'Q3 Results' and fix the typo in the subtitle"

Parameters:

deck.pdf — source file
1 — page number (note: may be 0-based or 1-based depending on version — if results are off by one, retry with the other)
Instruction string — plain English description of the change

More examples:

# Fix a typo
nano-pdf edit report.pdf 3 "Fix the typo 'recieve' to 'receive' in the second paragraph"
 
# Update a number
nano-pdf edit slides.pdf 7 "Update the revenue figure to $2.5M"
 
# Change branding
nano-pdf edit deck.pdf 1 "Replace 'Acme Corp' with 'Globex Inc' throughout the slide"
 
# Visual layout change
nano-pdf edit proposal.pdf 5 "Move the chart to the right side and add a title 'Growth Trend'"

Add a New Slide

nano-pdf add deck.pdf 15 "Create an executive summary slide with 5 bullet points summarizing the key findings"

The model analyzes the existing deck's visual style and generates a new slide that matches the aesthetic — fonts, colors, layout structure.

Practical Use Cases

Last-minute slide fixes: You're about to present and notice a typo on slide 3. Instead of reopening PowerPoint, finding the element, fixing it, and re-exporting:

nano-pdf edit presentation.pdf 3 "Fix the typo 'implimentation' to 'implementation'"

Automated report updates: A cron job pulls the latest metrics and updates the PDF with current numbers before it's distributed:

# Update the key metrics slide with fresh data
nano-pdf edit monthly-report.pdf 2 "Update the DAU figure to 124,500 and MRR to $890K"

CI/CD document pipeline: Before a scheduled report sends, inject the latest data:

# Shell script as part of a pipeline
METRICS=$(./fetch-metrics.sh)
nano-pdf edit report-template.pdf 1 "Update all metrics: ${METRICS}"

Style-consistent new slides: Add a slide that matches the existing deck's design without opening a graphics tool:

nano-pdf add investor-deck.pdf 12 "Add a 'Team' slide with three columns: Engineering, Product, and Design, with placeholder names"

Known Trade-offs (From the HN Discussion)

The Hacker News thread about nano-pdf surfaced honest trade-offs worth knowing:

File size increases: Pages converted to images are larger than their original PDF content stream equivalents. Decks with many edited pages can grow significantly.
Lossy round-trip: The image → OCR path loses text bounding box metadata. This can affect accessibility tools that rely on precise text positioning.
Gemini API cost: Gemini 3 Pro Image is a paid model. High-volume editing (dozens of pages) will accumulate API costs.
Page numbering ambiguity: The CLI's page index may be 0-based or 1-based depending on the version. Always verify with a test edit first.

Considerations

Requires Gemini API key — not free. Budget accordingly for batch operations.
Output quality depends on input quality: Blurry or low-resolution source PDFs produce lower-quality edits.
Complex instructions work better than simple ones: The model handles layout context well. "Move the chart to the right and add a legend" works better than trying to specify pixel coordinates.
Not suitable for form fields or interactive PDFs: The image-based approach strips interactivity. If your PDF has form fields, they'll be flattened in edited pages.
Sanity-check outputs: The SKILL.md explicitly says "Always sanity-check the output PDF before sending it out." The AI may interpret ambiguous instructions differently than intended.

Comparison: nano-pdf vs. Traditional PDF Editing

Approach	Knowledge Required	Scripting	Quality
nano-pdf	None (plain English)	✅ Full CLI	AI-generated image
PyPDF2 / pdfminer	Deep PDF internals	✅ Python	Text-layer only
reportlab	PDF generation API	✅ Python	High (programmatic)
Adobe Acrobat	GUI familiarity	❌ Manual	High
LibreOffice CLI	Medium	⚠️ Limited	Medium

nano-pdf occupies a unique niche: fully scriptable, no domain knowledge required, but AI-quality output (good but not pixel-perfect).

The Bigger Picture

nano-pdf represents a new class of tool: natural language as the editing interface for binary formats. PDFs have resisted programmatic manipulation for decades because their structure is complex and fragile. The image-based approach sidesteps that complexity entirely — treat the page as a visual, let a vision model understand and modify it, then put it back.

As Gemini's image reasoning improves, so will the quality of the edits. The nano-pdf skill bets on that trajectory — and at 37 GitHub stars and 14,591 downloads, there's clear appetite for exactly this kind of tool.

View the skill on ClawHub: nano-pdf

← Back to Blog