Show HN: Nano PDF – A CLI Tool to Edit PDFs with Gemini's Nano Banana
Posted by GavCo 10 days ago
The new Gemini 3 Pro Image model (aka Nano Banana) is incredible at generating slides, so I thought it would be fun to build a CLI tool that lets you edit PDF presentations using plain English. The tool converts the page you want to edit into an image, sends it to the model API together with your prompt to generate an edited image, then converts the updated image back and stitches into the original document.
Examples:
- `nano-pdf edit deck.pdf 5 "Update the revenue chart to show Q3 at $2.5M"`
- `nano-pdf add deck.pdf 15 "Create an executive summary slide with 5 bullet points"`
Features:
- Edit multiple pages in parallel
- Add entirely new slides that match your deck's style
- Google Search enabled by default so the model can look up current data
- Preserves text layer for copy/paste and search
It can work with any kind of PDF but I expect it would be most useful for a quick edit to a deck or something similar.
Comments
Comment by tecoholic 10 days ago
Does this mean the text only pdf page is transformed into an image that covers the full page, but the text is still under there. So, any machine based extraction would still get the text, but would probably loose all the bounding box information and regular users cannot just use their mouse to select text anymore?
Comment by kumarm 10 days ago
My Text to Speech app uses bounding box to display what text in PDF is being read and would not work well PDF's from this project.
Comment by GavCo 10 days ago
Comment by lxe 10 days ago
Comment by thenthenthen 10 days ago
Comment by esafak 9 days ago
Comment by shevis 10 days ago
Comment by falcor84 10 days ago
Comment by moezd 10 days ago
Many thanks to humanity for failing to standardise PDF and this project for paying interest on that tech debt with datacenter levels of energy consumption.
Comment by struc_so 8 days ago
Does this approach rewrite the entire file structure on save, or are you appending incremental updates to the EOF? Incremental is safer for corruption, but file size bloats quickly with AI-generated diffs.
Comment by treetalker 10 days ago
Comment by jimmySixDOF 2 days ago
Comment by perfectritone 10 days ago
Comment by itsmevictor 10 days ago
Comment by mentalgear 10 days ago
Comment by yoavm 10 days ago
Comment by varenc 10 days ago
Comment by ornornor 10 days ago
Comment by varenc 9 days ago
Basically, .avif is an "animated image" format, like .gif, but .webm is only a video format.
edit: just realized .webp i think can be an animated image! So that seems like the alternative
Comment by ornornor 9 days ago
Comment by iamflimflam1 10 days ago
Has anyone given any it a go? Does it work?
Comment by stingraycharles 10 days ago
I haven’t tried it, but there are plenty of examples.
Comment by albert_e 10 days ago
But people here are probably also looking for example input and output PDFs (or images/screenshots) showing the actual work done to get a sense of what to expect.
Comment by iamflimflam1 10 days ago
Comment by ThrowawayTestr 10 days ago
Comment by tkfoss 10 days ago
Comment by albert_e 10 days ago
PS: in my quick test of editing a PDF text -- the output PDF had weirdly added an extra "&" symbol at the end of every existing line of text. will try out more to see if it was something in the input PDF that was causing it.
Comment by fzysingularity 9 days ago
Comment by tkfoss 8 days ago
Comment by McNulty2 10 days ago
Comment by toddmorey 10 days ago
Comment by albert_e 10 days ago
Comment by mlpoknbji 10 days ago
Comment by ohans 7 days ago
Comment by informal007 10 days ago
Comment by vood 10 days ago
Comment by John7878781 10 days ago
After several iterations of edits, would the image quality decrease?
Comment by Zopieux 9 days ago
I wish an agent with a validation and rendering tools could instead manipulate the structure to accomplish those edits way less destructively, checking its progress with the tools.
Comment by mertleee 10 days ago
Comment by sultson 10 days ago