Show HN: I am building a map of people who lived in the Roman Empire
Posted by metiscus 6 days ago
Driving home from work one day, I wanted to know how many people we knew the names of who lived during the Roman era. Searching around, I found lists of Consuls and officials, but nothing that covered ordinary people or even most people like freedmen and slaves. So I ended up building a pipeline to process the more than 500k Latin inscriptions in the Epigraphic Database Clauss-Slaby https://edcs.hist.uzh.ch/en/ and extract the names of people (and attempt to cluster them, but this is a work in progress).
There are databases where Classicists have done this manually for specific regions, Trismegistos https://www.trismegistos.org/ and Latin Inscriptions of the Roman Empire (LIRE) https://pure.au.dk/portal/en/publications/latin-inscriptions... are two major efforts I found. But there doesn't seem to be a project that did what I set out to do, although I have read in some places that it was believed to be possible.
I am not a classicist or a web developer, but I have Claude and Gemini and I can sort of read basic Latin - so I set to work. I used LIRE and another database as ground truth and built a pipeline to extract and process the inscriptions to recover the names. The process I developed uses a high end LLM like Sonnet or Gemini Pro to supervise the extraction and tuning process on a regional basis until the obvious error rate is reasonable. For this, so far, reasonable to me means less than 1-2% in the smaller initial samples of 100-500 and no observed systemic issues. The different regions often need different prompts, so this basically became an exercise in letting the higher level AI tune the prompt for the lower level AI. The extraction when measured against LIRE produces an F1 score between 0.64 and 0.87, but take this with a grain of salt.
Once I had done a few regions, I wanted to see the work, so I threw together a pretty crude website but as I am not a web developer, it was crude in how it accessed its data. It does look cool and I also added summarization, and machine translation to each entry. I wanted to eventually get feedback from an actual team of classicists and make the website work better, so I am rewriting it as we speak but it is broadly functional now with a few extra bugs but substantially improved performance compared to the old one. All entries link back to the proper sources, and the old web app linked to several additional sources where the data was present, but I haven't gotten that working again just yet on the new one. (The old web interface is still available at https://roman-names.com, but I will warn you it is clunky and not mobile friendly at all)
Key findings so far:
AI supervised AI extraction saved me time. I was manually tuning things for a while and then the runbook became an idea that I feed my instructions in and let the big AI go with sparse oversight from me.
The extraction improved significantly (by about 10 F1 points) when I fed the model the raw text including the markers, vs a cleaned up version of the text.
I just thought it was a cool little project and wanted to share. If you happen to work in any adjacent space and there is something I could do better etc let me know.
Comments
Comment by jnovek 3 days ago
Could you make the dots smaller in the updated UI? I didn’t realize at first that you were using an actual map of Roman provinces.
My eyesight isn’t great and it would help if you used a political map rather than terrain. I’m not sure what’s out there for ancient Roman map tiles, though.
I’m not so much of an antiquity scholar AND I’m an American so my European geography isn’t perfect. It would be neat to be able to flip to a modern map, too, so I can see where things are in terms of modern landmarks.
You’re not getting a ton of comments so far, but FWIW these are the kinds of projects I come to HN for. I’ve been getting into opera lately and suddenly classical antiquity is very relevant to my interests. I’m going to keep this in my bookmarks, I’m finding the tangential historical stuff related to opera is drawing me in nearly as much as the music.
I’m also going to pass it on to an academic friend of mine who is working in an unrelated field but might find similar techniques useful.
Finally, when I first opened the map, I recognized the basic shape of the peak Roman Empire in the dots! I love when data does that kind of thing.
Thank you again for sharing this very cool project.
Comment by metiscus 3 days ago
Comment by retmarut 41 minutes ago
Comment by cwnyth 3 days ago
Comment by metiscus 3 days ago
1. Laepoca / Laepocus — Piquentum, Venetia et Histria (1–50 AD)
Three family members: two women (Laepoca Regilia, Laepoca Tuia) and a man (Metellus Laepocus). The nomen appears in both feminine and masculine forms in the same inscription, pointing to a
genuine local gentilicium, likely of Istrian or Liburnian origin.
https://new.roman-names.com/#edcs_id=EDCS-04200530
It looks like my auto-translation and summarization layer is hallucinating on this entry, but the extraction appears correct. I'll flag it for the next run. 2. Tocernius — Eraclea Veneta, Venetia et Histria (3rd c. AD)
Father (C. Tocernius Hermeros) and son (C. Tocernius Maximianus), the latter a soldier of Legio II Italica. Probably a Venetic name surviving into the imperial period.
https://new.roman-names.com/#edcs_id=EDCS-04200461
Here, the auto-translate and summary worked as intended. It does garble the dedication into the status. 3. Laulenia — Thibilis, Numidia
Two sisters, Laulenia Matrona and Laulenia Naxina, daughters of the same Marcus. The name looks Berber/Numidian in origin. (I should note that our pipeline transcribed the nomen as
Lauzenia — the raw EDCS text reads Laulenia, which is probably the correct form.)
https://new.roman-names.com/#edcs_id=EDCS-13500401
The auto-translate and summary layers do not make this error, only the name extraction layer does. I have flagged the entry and am diagnosing it. 4. Kanulanius / Nansinia — Flavia Solva, Noricum
Father (C. Kanulanius Eumitus) and son (C. Kanulanius Nepos, a soldier of Ala III Thracum). The K-spelling may reflect local Celtic orthographic convention. The wife's nomen, Nansinia,
also appears unattested in standard sources and may be a second find in the same inscription.
https://new.roman-names.com/#edcs_id=EDCS-14500644
Here there is an issue where I think in the processing for the web I am feeding interpreted text into the raw extraction field as my displayed raw text seems to be expanded from EDCS.
Mine:
Caius Kanulanius Eumitus vivus fecit sibi et Nansiniae Verecundae coniugi et Caio Kanulanio Nepoti filio militi alae III Thracum annorum XXV stipendiorum VI loco et impensa Anni FestiEDCS:
C(aius) Kanulani/us Eumitus / v(ivus) f(ecit) sibi et / Nansiniae / Verecundae con(iugi) / et C(aio) Kanulanio / Nepoti f(ilio) mil(iti) alae III / Thrac(um) an(norum) XXV stip(endiorum) VI / loco et impensa / Anni Festi
Comment by cwnyth 3 days ago
Also, you might want to include the source from EDCS. #3 above comes from ILAlg, and EDCS has a key for all the collections and their abbreviations. This will help someone be able to track down the original inscription more easily.
1. That first one is rough, and the translation is broken (it doesn't even translate Surus' name), but you got the people down. Regilia is just a guess, though.
3. Yep, Laulenia is the original name. Seems like AI is hallucinating here.
4. Have you thought about code that strips the parenthesis first, instead of letting AI do it? Also, loco et impensa is something like "grave site and expense", not "expense and initiation." Locus means "place", and in epitaphs often just refer to the burial place.
Comment by busyant 3 days ago
My son just graduated with a double major in classics and molecular biology (doing informatics work), so maybe he could help! lol
Comment by metiscus 3 days ago
Comment by trevoragilbert 3 days ago
Comment by metiscus 3 days ago
Comment by thom 3 days ago
Comment by doodlesdev 3 days ago
I've read the README in the feat-api branch and, from what I understand, you've already assessed that false negatives are not a model failure, but I'm not sure I understand why (haven't spent that much time looking at it though, just curious to hear from you).
This is a really cool project, by the way! In my opinion this is a place where LLMs shine: produce the work of hundreds of hours of manual human labor much quicker and cheaper, for something that no one else would ever bother to do the work!
Comment by metiscus 3 days ago
Comment by yubblegum 3 days ago
And just now I am watching I, Claudius.
Comment by Insanity 3 days ago
I just finished reading “I, Claudius” and “Cladius the God” this month. Didn’t know there was a series / movie, would you recommend it?
(I highly recommend the books FWIW, although I prefer the more modern writing style of the Cicero trilogy)
Comment by yubblegum 3 days ago
Comment by aduwah 3 days ago
Comment by goldfishgold 3 days ago
In general the word 'prosopography' might be helpful for you. There's been lots of work over the centuries on analyzing large groups of people in antiquity.
Comment by daviTeodoro 3 days ago
Comment by aspenmartin 3 days ago
Comment by metiscus 3 days ago
Comment by tosti 3 days ago
See e.g. https://upload.wikimedia.org/wikipedia/commons/c/ce/Peace_of...
From: https://commons.wikimedia.org/wiki/Atlas_of_European_history
Comment by metiscus 5 days ago
Comment by metiscus 3 days ago
Comment by metiscus 3 days ago
For reasons the main dev right now is on a branch, also the browse feature is live allowing a better search ability.
Comment by OJFord 3 days ago
Comment by metiscus 3 days ago
Comment by 1e1a 3 days ago
Comment by metiscus 1 day ago
Comment by frereubu 3 days ago
Comment by ingvay7 3 days ago
Comment by avyeed_desa 3 days ago
The ones around my place all use EDH, which also has a map feature, but not as intuitive as this! Reminds me of vici.org
Comment by metiscus 3 days ago
Comment by andai 3 days ago
Comment by metiscus 3 days ago
EDIT My instructions to the supervising LLM are in here https://github.com/metiscus/roman-names/blob/feature/webapp-...
Comment by metiscus 3 days ago
The running version on new. is the webapp branch. Eventually I will get it all fixed up.
Comment by Xotic007 2 days ago
Comment by CodeByBryant 3 days ago
Comment by tonymet 3 days ago
Comment by tonymet 3 days ago
Comment by metiscus 3 days ago
https://www.successoterra.net/en
The Vesuvius scrolls have been partially decoded with some interesting results. https://www.smithsonianmag.com/smart-news/three-students-dec...
The Vindolanda tablets are constantly being worked on as well https://www.heritagedaily.com/2017/07/roman-tablets-unearthe...
Comment by tonymet 2 days ago
Comment by oezi 3 days ago
Comment by countrymile 3 days ago
Comment by metiscus 3 days ago
Comment by jdthedisciple 3 days ago
So I couldn't even check it out properly.
Comment by bvan 3 days ago
Comment by ynxshiny 3 days ago
Comment by apiorno 19 hours ago
Comment by bshivarthy 2 days ago
Comment by AzizBytes 3 days ago
Comment by raychis 3 days ago
Comment by misano 3 days ago