Show HN: Marmot – Single-binary data catalog (no Kafka, no Elasticsearch)
Posted by charlie-haley 7 days ago
Comments
Comment by charlie-haley 7 days ago
Marmot is a single Go binary backed by Postgres. That's it!
It already supports: Full-text search across tables, topics, queues, buckets, APIs Glossary and asset to term associations
Flexible API so it can support almost any data asset!
Terraform/Pulumi/CLI for managing a catalog-as-code
10+ Plugins (and growing)
Live demo: https://demo.marmotdata.io
Comment by wiredfool 7 days ago
Comment by charlie-haley 7 days ago
Comment by pratio 7 days ago
What we missed on marmot was existing integrations with Airflow and other plugins like Tableau, PowerBI etc as well as other features such as sso, mcp etc.
We're an enterprise and needed a more mature product. Fingers crossed marmot reaches there soon.
Comment by charlie-haley 7 days ago
SSO is sort kind of available, but undocumented, it currently only supports Okta but I'm working on fleshing out a lot of this in the next big release (along with MCP)
Comment by pratio 7 days ago
I saw the plugin system but having never written any production ready go code, it doesn't make sense to just use an LLM to generate code and pull requests which you then need to spend time reviewing.
Marmot is a wonderful project and I'm sure it'll be worth the wait.
Comment by esafak 7 days ago
Comment by charlie-haley 7 days ago
Also, thanks for pointing out the issue with the docs, I'll get that fixed!
Comment by nodesocket 7 days ago
Comment by paddy_m 7 days ago
Also, what key decisions do other data catalogs make via your choices? What led to those decisions and what is the benefit to users?
Comment by charlie-haley 7 days ago
I like to think of Marmot as more of "operational" catalog with more of a focus on usability for individual contributors and not just data engineers. The key focus being on simplicity, in terms of both deployments and usability.
Comment by hilti 7 days ago
The demo is always incredible - finally, we’ll know where our data lives! No more asking “hey does anyone know which table has the real customer data?” in Slack at 3pm.
Then reality hits.
Week 1 looks great. Week 8, you search “customer data” and get back 47 tables with brilliant names like `customers_final_v3` and `cust_data_new`. Zero descriptions because nobody has time to write them.
You try enforcing it. Developers are already swamped and now you’re asking them to stop and document every column? They either write useless stuff like “customer table contains customers” or they just… don’t. Can’t really blame them.
Three months in, half the docs are outdated.
I don’t know. Maybe it’s a maturity thing? Or maybe we’re all just pretending we’re organized enough for these tools when we’re really not.
Comment by e1gen-v 7 days ago
Comment by e1gen-v 7 days ago
Comment by charlie-haley 7 days ago
Comment by rawkode 7 days ago
Comment by stym06 7 days ago
Comment by NortySpock 7 days ago
Versus the conceptually simpler "one binary, one container, one storage volume/database" model.
I acknowledge it's a false choice and a semi-silly thing to fixate on (how do you perf-tune ingestion queue problems vs write problems vs read problems for a go binary?)..
But, like, I have 10 different systems I'm already debugging.
Adding another one like a data catalog that is supposed to make life easier and discovering I now have 5-subsystems-in-a-trenchcoat to possibly need to debug means I'm spending even more time on babysitting the metadata manager rather than doing data engineering _for the business_
Comment by mrbluecoat 7 days ago
Comment by charlie-haley 7 days ago
Comment by nchmy 7 days ago
Comment by badmonster 7 days ago
Comment by charlie-haley 7 days ago
I'd also love to have some native integrations beyond Airflow. Once I've matured the existing plugin ecosystem a bit more, it's high on my list (along with column-level lineage).