Show HN: I built a real-time OSINT dashboard pulling 15 live global feeds

Sup HN,

So I got tired of bouncing between Flightradar, MarineTraffic, and Twitter every time something kicked off globally, so I wrote a dashboard to aggregate it all locally. It’s called Shadowbroker.

I’ll admit I leaned way too hard into the "movie hacker" aesthetic for the UI, but the actual pipeline underneath is real. It pulls commercial/military ADS-B, the AIS WebSocket stream (about 25,000+ ships), N2YO satellite telemetry, and GDELT conflict data into a single MapLibre instance.

Getting this to run without melting my browser was the hardest part. I'm running this on a laptop with an i5 and an RTX 3050, and initially, dumping 30k+ moving GeoJSON features onto the map just crashed everything. I ended up having to write pretty aggressive viewport culling, debounce the state updates, and compress the FastAPI payloads by like 90% just to make it usable.

My favorite part is the signal layer—it actually calculates live GPS jamming zones by aggregating the real-time navigation degradation (NAC-P) of commercial flights overhead.

It’s Next.js and Python. I threw a quick-start script in the releases if you just want to spin it up, but the repo is open if you want to dig into the backend.

Let me know if my MapLibre implementation is terrible, I'm always looking for ways to optimize the rendering.

Comments

Comment by afatparakeet 1 day ago

Optimizing some of that geojson into realtime tiles is a really fun and engaging project.

Have you seen these projects?

https://github.com/protomaps/PMTiles

https://github.com/maplibre/martin

Comment by vancecookcobxin 1 day ago

They are definitely on the horizon! I am a HUGE fan of both of those projects and they are definitely on the roadmap for the architecture...

Right now, ShadowBroker is really optimized for 'blinking blip' real-time radar tracking (streaming the raw GeoJSON payload from the FastAPI backend directly to MapLibre every 60s), so we get as close to as smooth 60fps entity animations across the map.

Moving to something like Martin would be incredible for handling EVEN MORE entities if we start archiving historical flight and AIS data into a proper PostGIS database, but the trade-off of having to invalidate the vector tile cache every few seconds for live-moving targets makes it a bit overkill right now....

Comment by afatparakeet 1 day ago

Yeah less ideal for the realtime data but could be useful for lightening the load of certain more static layers.

Great project, will be contributing!

Comment by vancecookcobxin 1 day ago

Glad to have you aboard!

Comment by KronisLV 1 day ago

Protomaps is really cool also when you just want maps for a country and to serve them without too much of a hassle, their CLI has pretty much everything you need: https://docs.protomaps.com/pmtiles/cli

I set that up for an agricultural project a while back.

Comment by totetsu 1 day ago

Is this kind of Hyper-awareness of data you can't actually do anything about even a desirable thing, or just a pathway into a hole of hyper-alert stress and low Self-efficacy?

Comment by ahannigan 1 day ago

Looks similar to https://monitor-the-situation.com/

Comment by himmi-01 13 hours ago

This one is so good. Bookmarked. Thanks. I think the only thing I need now is to enter a city name and it gathers data if available.

Comment by vavkamil 1 day ago

You leaked `./frontend/.env.local` & `./backend/.env` inside `ShadowBroker_v0.1.zip` in the first commit.

Comment by tfghhjh 1 day ago

thats why its called osint

everything is open source

Comment by Escapade5160 8 hours ago

Whole thing feels very vibe coded. Even OP's post here.

Comment by DetroitThrow 1 day ago

the real OSINT is always in the comments

Comment by porridgeraisin 1 day ago

What made you check that

Comment by wildrhythms 2 hours ago

It's both the first and last thing to check

Comment by stef25 5 hours ago

It's called Hacker News

Comment by 1 day ago

Comment by CountGeek 1 day ago

This is neat. It reminds me of this https://curves-voluntary-livecam-sandra.trycloudflare.com/

Comment by born-jre 1 day ago

i was building sth like this

https://github.com/blue-monads/potato-apps/tree/master/cimpl...

i should finish but have not have time

Comment by rationalist 1 day ago

Risky click. (It's okay.)

Comment by tylervigen 1 day ago

> ShadowBroker is a real-time, full-spectrum geospatial intelligence dashboard

You might consider changing this to a more accurate headline, like "Air and Space domain awareness."

"Full spectrum Geospatial intelligence" most commonly refers to full color satellite photos (sometimes including near infrared).

In the Geospatial world, "spectrum" almost always takes on its literal meaning - the spectrum of light. And "Geospatial intelligence" refers to intelligence gathered from Geospatial platforms, not intelligence about the locations of those platforms.

Comment by 4mitkumar 1 day ago

Very cool! Although, the concept, the feeds, the design and everything reminds me of https://www.worldmonitor.app/ - also live and deployed btw, if you want to check out the interface.

Comment by 1 day ago

Comment by ionwake 1 day ago

Really cool thanks for sharing. What are the API costs like if i ran this for a couple hours a day for a month? Is it affordable?

Comment by vancecookcobxin 1 day ago

Its all free baby lol

Comment by ryanholtdev 21 hours ago

Neat aggregation. One thing worth adding to the feed pipeline: a staleness signal. Several of these sources (threat feeds especially) have update cadences measured in hours, not seconds. Displaying last-updated timestamps per source would help users weight freshness vs. noise when triaging.

Comment by rustyhancock 1 day ago

There's no data when I tried it on a windows 11 PC. It seemed to install all deps front end is served but dossier says intel unavailable.

No planes etc.

No helpful output in the command window.

Seems fun but doesn't seem to be working.

Comment by vancecookcobxin 1 day ago

Ah, that's my fault for not making the error handling clearer in the UI. If the map is blank, it usually means the backend is missing the .env file with the free API keys (AISSTREAM_API_KEY and N2YO_API_KEY), so it's silently failing to fetch the streams.

Did the terminal throw any Python FastAPI errors, or did it just serve the Next.js frontend? I'm going to push an update later today to show a prominent "Backend Disconnected / Missing API Keys" warning on the UI so it doesn't just look dead. Thanks for testing it!

Comment by AH4oFVbPT4f8 1 day ago

On the topic of API Keys, for Opensky it's OPENSKY_CLIENT_ID and OPENSKY_CLIENT_SECRET, the readme has OPENSKY_USERNAME and OPENSKY_PASSWORD

Comment by porridgeraisin 1 day ago

The perils of vibe coding.

Comment by rustyhancock 1 day ago

Looks like I had to use python-3.11 and install a few dependencies.

Comment by AH4oFVbPT4f8 1 day ago

Change the requirements.txt in the backend to the following

fastapi==0.103.1

uvicorn==0.23.2

yfinance>=0.2.40

feedparser==6.0.10

legacy-cgi==2.6.1

requests==2.31.0

apscheduler==3.10.3

pydantic==2.11.0

pydantic-settings==2.8.0

playwright>=1.58.0

beautifulsoup4>=4.12.0

sgp4>=2.22

cachetools>=5.3.0

cloudscraper>=1.2.71

reverse_geocoder>=1.5.1

lxml>=5.0

python-dotenv>=1.0

and be on python 3.13 and it should get you up and running

Comment by edwcross 1 day ago

Thanks, it helped some, but I'm still having an error:

  [1] node:internal/modules/cjs/loader:1368
  [1]   throw err;
  [1]   ^
  [1] 
  [1] Error: Cannot find module '/home/user/shadow/start-backend.js'
  [1]     at Function._resolveFilename (node:internal/modules/cjs/loader:1365:15)
  [1]     at defaultResolveImpl (node:internal/modules/cjs/loader:1021:19)
  [1]     at resolveForCJSWithHooks (node:internal/modules/cjs/loader:1026:22)
  [1]     at Function._load (node:internal/modules/cjs/loader:1175:37)
  [1]     at TracingChannel.traceSync (node:diagnostics_channel:322:14)
  [1]     at wrapModuleLoad (node:internal/modules/cjs/loader:235:24)
  [1]     at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:171:5)
  [1]     at node:internal/main/run_main_module:36:49 {
  [1]   code: 'MODULE_NOT_FOUND',
  [1]   requireStack: []
  [1] }

Comment by trick-or-treat 1 day ago

This is fun, Playwright is a python wrapper for a node lib. So we have Next.js (full stack js/ts), with a Python backend (???), that uses a wrapper for a js/ts lib, all we need now is to shell out to node from inside that script and we have peak inception lol.

Comment by euroderf 8 hours ago

And it's not complete until it runs in an emulation layer.

Comment by spzb 1 day ago

Same on a Mac

Comment by DetroitThrow 1 day ago

Yeah this doesn't work on Mac either. This is just broken and nonfunctioning.

Comment by vancecookcobxin 1 day ago

Apparently, I had a bunch of front end developmental scripts that were calling the Windows version of python. Working on it now.

Comment by ryanholtdev 1 day ago

The multi-source aggregation approach is exactly right for this use case -- the value isn't any single feed, it's the correlation between them. Flight diversions, AIS gaps, and social spikes at the same coordinates at the same time tell a very different story than any one of those signals alone.

Curious whether you're doing any timestamp normalization across feeds. Marine AIS in particular can be spoofed or delayed, and correlated analysis gets messy fast if the time windows aren't aligned.

Comment by laborcontract 1 day ago

I've seen so many of these in the last week alone.

I need a realtime OSINT dashboard for OSINT dashboards.

Comment by nonameiguess 1 day ago

It's pretty interesting to see. My very first real software job was working on ground processing algorithms for the US Navy's Maritime Domain Awareness system, which is the "real" version of something like this that actually gives centimeter scale live activity detections of basically the entire world. The engineering effort that goes into something like that is immense. Bush announced in like 2004 or something and we didn't go into full operational capability until 2015. Thousands of developers across intel, military, commercial contractors, for over a decade, inventing and launching new sensor platforms, along with build outs of the data centers to collect, process, store, and make sense of all this.

I wish these weekend warriors would work on a project like that someday, to see what capabilities truly take. You want to know what's happening in the world, you need to place physical sensors out there, deal with the fact that your own signals are being jammed and blocked, the things you're trying to see are also trying to hide and disguise themselves.

The attention to detail is something I've never seen replicated outside. Every time we changed or put out a new algorithm, we had to process old data with it and explain to analysts and scientists every single pixel that changed in the end product and why.

Comment by vancecookcobxin 1 day ago

I get it! Unfortunately, you need a security clearance or a really fat wallet to get that kind of data. OSINT is a different thing.

Comment by the_real_cher 1 day ago

One guy vs the DoD

apples and oranges

Comment by operatingthetan 1 day ago

Which is the best one so far?

Comment by laborcontract 1 day ago

I'm going to have to create an Awesome Best OSINT dashboards github repo to answer that.

Comment by skinnymuch 1 day ago

Reminds me of all the Covid data trackers in mid 2020

Comment by efromvt 1 day ago

I'd be interested in just the data layer of this being extractable - will poke around at that. (frontend is fun, though!).

Comment by poemxo 1 day ago

Why the name Shadowbroker? It sounds a lot like the Shadow Brokers which is the hacker group that stole and published some NSA hacking tools.

Then again they were named after a video game character so it's probably fair.

Comment by lloeki 1 day ago

> they were named after a video game character

(spoiler alert if you ever intend to play ME)

https://masseffect.fandom.com/wiki/Shadow_Broker

Comment by coolius 1 day ago

i wish someone could deploy this somewhere so we can try it out without having to build it first

Comment by anigbrowl 1 day ago

I’ll admit I leaned way too hard into the "movie hacker" aesthetic for the UI

Nothing wrong with that. Beats a boring corporate dashboard any day. Video game and similar interfaces work for a reason.

Comment by hettygreen 1 day ago

This looks really cool..

Let me ask a dumb question. Can this be run on a public server (I use dreamhost) with a web interface for others to see? Or is this strictly something that gets run on a local computer?

Comment by vancecookcobxin 1 day ago

Well, I have to make some modifications, but that isn't recommended right now because I have a settings option with the API key right there for the free world to see, lol. I will work on making a version for hosting it, though.

You can throw it on a server and run it for you to see (or anyone else if you trust people or dont care about losing your free API keys) It's just a standard Next.js and FastAPI stack, and there are Dockerfiles in the repo so it should be pretty straightforward to spin up on a cheap VPS (like a DigitalOcean droplet or Hetzner).

Honestly, if you just want to show it off to a few people, running it locally and exposing it with a Cloudflare Tunnel or Ngrok is probably the path of least resistance.

I WILL work on having a version to host it where users have to bring their own keys to see it in the future though

Comment by silverstream 1 day ago

Cloudflare Tunnel is solid for quick demos. One thing though — if you're planning the "bring your own keys" version, don't just throw them in a settings page. I went down that road and ended up with keys sitting in localStorage where any XSS could grab them. What worked better for me was having the backend hold the keys and issuing short-lived session tokens to the frontend. More moving parts but way less surface area if something goes wrong.

Comment by vancecookcobxin 1 day ago

Stellar advice! I will totally keep that in mind. Thanks!

Comment by Karrot_Kream 1 day ago

If you want to host for friends/trusted devices, you can put it on a Tailscale or Zerotier style network and just let trusted devices access the server wrt to the OP's point about open secrets. Or you could probably make a PR to load the settings from somewhere else.

Comment by pugworthy 1 day ago

I was hoping for something like the old Henchman's Helper site, which went offline around 2016-17.

Archive version...

https://web.archive.org/web/20120112012912/http://henchmansh...

Comment by cloudez 1 day ago

Bringing ADS-B, AIS, satellite telemetry, and GDELT into a single local dashboard is a great idea. I’ve already run it in my container environment.

Comment by 1 day ago

Comment by garyfirestorm 1 day ago

> Do not use this tool for any operational, military, or intelligence purpose.

How long before we see this UI in some Iran related news story

Comment by blitzar 1 day ago

I dont think this will dethrone the three tabs of twitter feed as the war room data source.

https://www.yahoo.com/news/articles/why-f-ck-x-big-220249332...

@grok who should we boomb next?

Comment by vancecookcobxin 1 day ago

I REALLY, REALLY dont want it used for that type of stuff.

Comment by amelius 1 day ago

Does it show locations of datacenters?

Comment by darkce 9 hours ago

Nice!

Comment by david_shi 1 day ago

not knocking this specific implementation in any way, but it's crazy that live OSINT dashboards are now the demo project of choice vs. todo apps

Comment by euroderf 8 hours ago

Integrate the two and you have a global to-do dashboard fit for any wannabe world dominator.

Comment by blitzar 1 day ago

aggregating API data sources + visual display of data - honestly seems a good fit for a demo project.

Comment by rakag 1 day ago

    assessment = "ANALYSIS: "

    if any(k in keywords for k in ["strike", "missile", "attack", "bomb", "drone"]):

        assessment += f"{random.randint(75, 95)}% probability of kinetic escalation within 24 hours. Recommend immediate asset relocation from projected blast radius."

    elif...

Lol.

Comment by hofrogs 1 day ago

That whole code block is pretty funny with those random percentages. Looks like a prop made for a movie or something.

Comment by serf 1 day ago

cool idea.

first llm to stop using those damn colors for every single transparent modal in existence is going to be a big step forward.

Comment by raised_hand 1 day ago

Is this hosted anywhere?

Comment by 4mitkumar 1 day ago

Try this https://www.worldmonitor.app/ for a hosted version of this...from a different dev but very, very close.

Comment by fittingopposite 1 day ago

Website ist down..

Comment by chid 1 day ago

Did I see this on X first?

Comment by whattheheckheck 1 day ago

Yoooo this is amazing... can you add rss feeds like feeder.co aggregating subreddits and groundnews articles embedded in here too?

And add chronological feeds of govtrack.us along with all politicians social media feeds

Comment by hbarka 1 day ago

Comment by operatingthetan 1 day ago

I don't understand why that youtuber was acting like spy satellites going over was such a big deal, they are going over the entire planet, all the time.

edit: no idea why they deleted the comment but they linked to this video https://www.youtube.com/watch?v=0p8o7AeHDzg

Comment by kjs3 1 day ago

Claude told him it was a big deal. Why would he question Claude.

Comment by crawfordcomeaux 1 day ago

I'm excited to see tooling of this nature and scope. Looking forward to seeing similar tooling oriented around all human needs so we can start tracking the meeting of needs to better meet needs, particularly in ways that don't require money.

Comment by jll29 1 day ago

Thanks for opening this up.

As was already said in one of the reference videos, it's impressive what one person can do.

But the next step is to define an architecture where authors can defined/implement plug-ins with particular modular capabilities instead of one big monolith. For example, instead of front-end (GUI) and back-end (feeds), there ought to be a middle layer that models some of the domain logic (events: surces, filters, sinks; stories/time lines etc.).

I would like to see a plug-in for EMM (European Media Monitor) integrated, for instance ( https://emm.newsbrief.eu/NewsBrief/alertedition/en/ECnews.ht... ).

Comment by touchchoice 22 hours ago

Here's my thorough analysis after reviewing the entire project:

---

## Verdict: Not malicious

This is an *OSINT (Open Source Intelligence) dashboard* called "ShadowBroker" that aggregates publicly available real-time data — flights, ships, satellites, CCTV, news, radio, weather, earthquakes, stock markets, and geopolitical events — onto a map. The name references the infamous hacking group but the code itself contains no malware.

---

## What `start.sh` does

1. Checks for Node.js and Python 3 2. Creates a Python venv and installs dependencies from `requirements.txt` 3. Installs npm packages from `frontend/package.json` 4. Runs `npm run dev` which starts both a Next.js frontend and a FastAPI (uvicorn) backend

*No obfuscated commands, encoded payloads, curl/wget to suspicious URLs, reverse shells, or hidden steps.*

---

## What the full codebase does

It fetches data from these *legitimate public sources*:

| Category | Sources | |---|---| | Aviation | adsb.lol (open ADS-B), OpenSky Network (OAuth2) | | Maritime | aisstream.io (AIS vessel tracking) | | Satellites | CelesTrak (NORAD TLEs), SGP4 propagation | | CCTV | TfL London, Singapore LTA, Austin TX, NYC DOT, OpenStreetMap | | News | NPR, BBC, Al Jazeera, NYT, GDACS, NHK RSS feeds | | Radio | Broadcastify (scraping), OpenMHz API | | Weather | RainViewer | | Earthquakes | USGS GeoJSON feed | | Markets | Yahoo Finance (defense stocks, oil) | | Geopolitics | GDELT, Liveuamap (Playwright scraping) |

---

## Things that are NOT present (good signs)

- No data exfiltration — nothing sends your personal data anywhere - No reverse shells or backdoors - No cryptominer code - No encoded/obfuscated payloads - No filesystem scanning or credential harvesting - No network scanning or port scanning - The `subprocess.run` call in `network_utils.py` uses argument lists (not `shell=True`), preventing command injection

---

## Noteworthy concerns (not malicious, but worth awareness)

1. *`cloudscraper` + Playwright stealth* — Used to bypass Cloudflare/Turnstile protections on Liveuamap and OpenMHz. Legally gray (may violate those sites' ToS).

2. *CORS wide open* (`allow_origins=[""]`) in `main.py` — acceptable for a local-only tool, but means any website you visit could make requests to your local backend on port 8000 while it's running.

3. *API key management* — The `/api/settings/api-keys` PUT endpoint writes to `.env` on disk. It does validate against a whitelist of known keys and rejects newlines, but it's exposed without authentication on localhost.

4. *Resource consumption* — The scheduler makes hundreds of outbound API calls per hour from your IP to public services (ADSB, OpenSky, CelesTrak, USGS, RSS feeds, etc.).

5. *UAV data is fake* — `fetch_uavs()` generates simulated drone positions in conflict zones. It's not real tracking data.

6. *Dependencies are all legitimate* — `fastapi`, `yfinance`, `feedparser`, `playwright`, `beautifulsoup4`, `requests`, `sgp4`, etc. are all well-known Python packages. Frontend deps (Next.js, React, MapLibre, Tailwind) are standard.

---

*Bottom line*: Safe to run. It's a hobbyist OSINT dashboard with an edgy name. No malicious behavior detected anywhere in the codebase.

Comment by driverdan 1 day ago

What's with so many people creating new accounts to promote LLM generated projects? Are they people who don't care about HN and just trying to self promote? Existing users creating new accounts? Lurkers?

Comment by beepbooptheory 1 day ago

It's a bummer because sometimes the headline seems cool, but its always generated blah blah recently. I don't think I've seen a non-AI readme on here in months..

Everyone has their own hueristic, but if it took someone 6 hours or whatever to make some whole big app, my confidence that they will continue to maintain or care about it even next week is pretty much zero... How could they? They've already made three other apps in that time!

I don't care if the code is perfect, all this stuff just has the feel of plastic cutlery, if that makes sense.

Comment by polynomial 1 day ago

Plastic cutlery is a dead-on perfect analogy.

Comment by gregjw 1 day ago

Plastic cutlery, thats great.

Comment by alephnerd 1 day ago

How is this AI slop? It seems functional and actually reminds me of a couple alphas I saw of similar threat intel products 10-15 years ago.

Of course it's commoditized and a dime-a-dozen today, but if this is what HN terms as "AI slop" then apparently human SWEs weren't that much better.

Comment by driverdan 1 day ago

I never said AI slop.

Comment by alephnerd 1 day ago

Ah! I misinterpreted your comment then!

Comment by btbuildem 1 day ago

Lol please at least clean up the markdown diagram -- claude has a real hard time aligning the borders in ascii art for some reason.

Comment by mentalgear 1 day ago

dont give these OSINT quality signals away ... that's one of the indicators that allow you on first scan to id (potentially) low quality content. Ie: fully llm gen; the author doesnt look over the docs or doesnt care for 'details'.

Comment by vancecookcobxin 1 day ago

Thank you for the heads up! Will do.

Comment by totetsu 1 day ago

“The first Matrix I designed was quite naturally perfect, it was a work of art, flawless, sublime; a triumph equaled only by its monumental failure. The inevitability of its doom is apparent to me now as a consequence of the imperfection inherent in every human being. Thus I redesigned it, based on your history, to more accurately reflect the varying grotesqueries of your nature. However, I was again frustrated by failure. ”

Comment by erichocean 1 day ago

Yup, I had Claude write a tool to auto-fix those diagrams. :D

Comment by syskuh 1 day ago

[dead]

Comment by 1 day ago

Comment by the_biot 1 day ago

[flagged]

Comment by tomhow 1 day ago

We are sympathetic, but it's still not OK to fulminate on HN, no matter what it's about. It just makes the place miserable. Please flag it or email us (hn@ycombinator.com) if you think a post is unfit for HN.

https://news.ycombinator.com/newsguidelines.html

Comment by razodactyl 1 day ago

@dang - HN tearing itself apart over use of AI isn't conducive to a strong cohesive community.

Nobody here is at fault, we're in very trying times - we need to adjust with patience and consideration.

Use of AI to launch rapid prototypes is like breadboarding a new product. It has a place but it's moving so fast that it's hard to lock down at the moment.

No point everyone throwing excess cortisol in this direction. <3

Comment by the_biot 1 day ago

Very true, I see people increasingly polarized on this topic. I also see it in the rollercoaster of votes on my post.

If it wasn't clear, I think we're (as a society) destroying ourselves by believing in all this generative AI crap, even contrary to the evidence of how wrong it often is, the hallucinations, the awful quality etc.

I think we're witnessing the death of intellect: when you discard the evidence in favor of something that only looks right but is nonsense, there's no telling where it will end. If your profession requires you to think and produce output accordingly, but suddenly nobody thinks wrong answers matter, then your profession no longer exists.

Standing up against it and refusing to accept any form of AI anywhere is the only reasonable thing to do. And I don't know if it will make a difference.

Comment by threethirtytwo 1 day ago

This is actually really good. Like this kind of app built before AI everyone would praise it.

It's only slop because anyone can make it now and we're all sick of clones.

The app is good, but the effort required to make it is not impressive at all. I think calling this slop is a misnomer. It's not slop. It's better than what most of us can do and done in a significantly faster amount of time. Calling it slop implies you can do better... which you can't.

Comment by ratsimihah 1 day ago

I find non-constructive feedback more tiring. People just dismiss things as soon as it has the faintest trace of AI without judging them for what they actually are.

Not saying the AI slop noise isn’t annoying though.

Comment by bakugo 1 day ago

Why are you entitled to receiving constructive feedback on "your" project when you couldn't be bothered to write the project yourself in the first place?

If you want "feedback" of the same quality and effort as the project itself, you can always go ask your beloved AI for feedback instead of wasting precious human time.

Comment by ratsimihah 1 day ago

Would you dismiss solutions to mathematical problems solved by AI?

If I’m driving an AI towards finding a solution, would it be any different for a software project?

Comment by spzb 1 day ago

Mathematical proof vs a web app that doesn't actually run? Not much of a contest.

Never mind the fact that AIs of the LLM-variety haven't and aren't going to find solutions to mathematical problems.

Comment by sdoering 1 day ago

> Never mind the fact that AIs of the LLM-variety haven't and aren't going to find solutions to mathematical problems.

This is empirically wrong as of early 2026.

Since Christmas 2025, 15 Erdos problems have been moved from "open" to "solved" on erdosproblems.com, 11 of them crediting AI models. Problems #397, #728, and #729 were solved by GPT-5.2 Pro generating original arguments (not literature lookups), formalized in Lean, and verified by Terence Tao himself. Problem #1026 was solved more or less autonomously by Harmonic's Aristotle model in Lean.

At IMO 2025, three separate systems (Gemini Deep Think, an OpenAI system, and Aristotle) independently achieved gold-medal performance, solving 5 of 6 problems.

DeepSeek-Prover-V2 hits 88.9% on MiniF2F-test. Top models solve 40% of postdoc-level problems on FrontierMath, up from 2%.

Tao's own assessment as of March 2026: AI is "ready for primetime" in math and theoretical physics because it "saves more time than it wastes."

You can disagree about where this is heading, but "haven't and aren't going to" doesn't survive contact with the data.

Comment by fredoliveira 1 day ago

Indeed. And adding on to this, in a slightly different realm, Donald Knuth's conjecture that he solved with Claude: https://www-cs-faculty.stanford.edu/%7Eknuth/papers/claude-c...

Comment by spzb 1 day ago

> solved more or less autonomously

So, not autonomously.

Comment by sdoering 1 day ago

q.e.d.

Comment by ratsimihah 1 day ago

You got really specific to help prove your point. We were generalising to projects built by AI, not web apps that don’t run, which isn’t relevant since LLMs can clearly build fully working projects.

Also how does getting into the specifics of which type of AI can solve mathematical problems helps the comparison here?

Comment by spzb 1 day ago

You were the one who made the comparison

Comment by enraged_camel 1 day ago

Man, the overwhelming majority of your comments over the past several months are you whining about AI or being extremely salty about anything remotely AI related. You bash AI content, people who use AI to make cool stuff, AI companies, people who say anything positive about said companies... I really wonder what exactly you think your negative attitude contributes to these discussions.

Comment by monkaiju 1 day ago

I think it contributes to a general pushback against AI, which some of us appreciate...

Comment by bakugo 1 day ago

It contributes far more than yet another low effort AI-generated Show HN on top of the dozens already submitted every day.

If you think you made "cool stuff" with AI, great, enjoy it, but also please keep it to yourself because anyone else can generate the exact same thing if they want it, you are not special, and are actively downing out real human effort and passion.

Comment by enraged_camel 1 day ago

How is that any different than your incessant whinings drowning out real human discussion?

Comment by hammock 1 day ago

[flagged]

Comment by tomhow 1 day ago

Please don't introduce off-topic flamebait here.

Comment by beoberha 1 day ago

Sounds like OP did some pretty cool engineering to make this run performantly. Definitely not your run of the mill AI slop.

Comment by DetroitThrow 1 day ago

It doesn't run at all. If you can get it running, let me know.

Comment by mentalgear 1 day ago

why? at least the chart in the docs suggests otherwise.

Comment by serf 1 day ago

eh.

performance is easy. you can craft a test suite that will allow a ralph loop to iterate until it hits the metrics.

the hard part of style/feel/usability. LLMs still suck at that stuff, and crafting tests to produce those metrics is nigh impossible.

Comment by ionwake 1 day ago

[flagged]

Comment by khaki54 1 day ago

you can't downvote replies so it wasn't him

Comment by ionwake 1 day ago

oh looks like im a tit. thanks for the clarification.

Comment by hackerbeat 1 day ago

[flagged]

Comment by top_sigrid 1 day ago

Is this related in any way to the post or just spam?

Comment by hackerbeat 1 day ago

It's to make you forget about catastrophe porn and have a laugh instead.

Comment by rcbdev 1 day ago

The first one I got was about how apparently for U.S.-Americans, health insurance does not cover dental and ocular health. Reading that actually made me feel really bad for U.S.-Americans.