Show HN: Ocrbase – pdf → .md/.json document OCR and structured extraction API
Posted by adammajcher 4 days ago
Comments
Comment by sync 3 days ago
The "guts" are here: https://github.com/majcheradam/ocrbase/blob/7706ef79493c47e8...
Comment by M4R5H4LL 3 days ago
Comment by tuwtuwtuwtuw 3 days ago
Do people actually store their secrets in plain text on the file system in production environments? Just seems a bit wild to me.
Comment by adammajcher 3 days ago
Comment by Oras 3 days ago
Comment by Tiberium 3 days ago
Comment by prats226 3 days ago
We recently published a cookbook for constrained decoding here: https://nanonets.com/cookbooks/structured-llm-outputs/
Comment by constantinum 3 days ago
Equally important is how easily you can build a human-in-the-loop review layer on top of the tool. This is needed not only to improve accuracy, but also for compliance—especially in regulated industries like insurance.
Other tools in this space:
LLMWhisperer/Unstract(AGPL)
Reducto
Extend Ai
LLamaparse
Docling
Comment by binalpatel 3 days ago
https://binal.pub/2023/12/structured-ocr-with-gpt-vision/
Back of the napkin math (which I could be messing up completely) but I think you could process a 100 page PDF for ~$0.50 or less using Gemini 3 Flash?
>560 input tokens per page * 100 pages = 56000 tokens = $0.028 input ($0.5/m input tokens) >~1000 output tokens per page * 100 pages = $0.30 output ($3/m output tokens)
(https://ai.google.dev/gemini-api/docs/gemini-3#media_resolut...)
Comment by adammajcher 3 days ago
Comment by v3ss0n 4 days ago
Comment by jadbox 3 days ago
Comment by kspacewalk2 3 days ago
Comment by sixtyj 3 days ago
Comment by Jimmc414 3 days ago
Flash 2.5 or 3 with thinking gave the best results.
Comment by sixtyj 3 days ago
Tesseract is a cheap solution as it doesn’t touch any LLM.
For invoices, Gemini flash is really good, for sure, and you receive “sorted” data as well. So definitely thumbs up. I use it for transcription of difficult magazine layout.
I think that for such legally problematic usage as companies don’t like to share financial data with Google, it is be better to use a local model.
Ollama or HuggingFace has a lot of them.
Comment by v3ss0n 2 days ago
Comment by hersko 4 days ago
Comment by trollbridge 3 days ago
Comment by saaaaaam 3 days ago
Discussion is here: https://news.ycombinator.com/item?id=45652952
Comment by mimim1mi 4 days ago
Comment by unrahul 3 days ago
Comment by sgc 4 days ago
Comment by mjrpes 4 days ago
Comment by actionfromafar 3 days ago
Comment by jasonni 3 days ago
There is a pipeline solution with multiple small specific models that can run only with CPU: https://github.com/RapidAI/RapidOCR
Comment by sgc 3 days ago
Comment by cess11 3 days ago
Comment by adammajcher 3 days ago
Comment by cess11 3 days ago
Comment by fmirkowski 3 days ago
Comment by mechazawa 4 days ago
Comment by adammajcher 3 days ago
Comment by mechazawa 2 days ago
Comment by woocash99 3 days ago
Comment by woocash99 3 days ago