Show HN: 83 browser-use trajectories, visualized

Posted by wayy 17 hours ago

Counter7Comment1OpenOriginal

Hey all, Justin here. I previously built Phind, the AI search engine for developers.

One of the biggest problems we had there was figuring out what went wrong with bad searches. We had tons of searches per day, but less than 1% of users gave any explicit feedback. So we were either manually digging through searches or making general system improvements and hoping they helped.

This problem gets harder with agents. Traces are longer and more complex. It takes more effort to review them, so I'm building a tool that lets you analyze LLM outputs directly to help developers of LLM apps and agents understand where things are breaking and why.

I've put together a demo using browser-use agent traces (gpt-5): https://trails-red.vercel.app/viewer

It's early, but I have lots of ideas - live querying of past failures for currently-running agents, preference models to expand sparse signal data.

Would love feedback on the demo. Also if you're building agents and have 10k+ traces per day that you're not looking at but would like to, I'd love to talk.

Comments

Comment by Johnny_Bonk 13 hours ago

This is a cool project, I've also been trying to find some sort of leaderboard or benchmark to compare. I personally really like the Claude in chrome agent but unfortunately I don't think I can build it into projects yet