Show HN: Extract (financial) data from emails with local LLM
Posted by brainless 1 hour ago
I wanted to have all my emails (and files) scanned for financial data. Transactions, Bills (I may not have paid). I wanted this to run entirely locally and not depend on a Large Language Model from a cloud provider.
I initially started with Google Gemini 3 Flash but I switched to Ollama + Ministral 3:3b. The extraction is not exhaustive and there is much to improve but this is working.
dwata runs locally, runs a web backend and the gui runs in browser. Connects to emails, downloads them. Then we can run the financial template detection. It checks for similar looking emails, grouped by sender. Then sends a sample from each cluster to LLM agent. The LLM is asked to find out the parts of text that look like the data we are looking for. dwata then searches for the variables/values that LLM gave in the email, creates a template by replacing the data with template tags. Saves template to DB. dwata parse the data from each email when extracting data.
Roadmap: There is a long way to go, the extractor needs to work much, much, better. dwata will also work on files soon (bank/CC statements).
I want to extract vendors, businesses, contacts, events, places, etc. Connect to different APIs and process everything locally.
dwata will be able to download and process data from Hacker News API too (or other similar sources) - extract entities you care about.
Eventually, only use Ollama/Llama.cpp with models that fit 6-8GB graphics cards or 16GB unified memory only!!
Comments
Comment by yubainu 1 hour ago