Show HN: BrowserOS – "Claude Cowork" in the browser
Posted by felarof 1 day ago
Hey HN! We're Nithin and Nikhil, twin brothers building BrowserOS (YC S24). We're an open-source, privacy-first alternative to the AI browsers from big labs.
The big differentiator: on BrowserOS you can use local LLMs or BYOK and run the agent entirely on the client side, so your company/sensitive data stays on your machine!
Today we're launching filesystem access... just like Claude Cowork, our browser agent can read files, write files, run shell commands! But honestly, we didn't plan for this. It turns out the privacy decision we made 9 months ago accidentally positioned us for this moment.
The architectural bet we made 9 months ago: Unlike other AI browsers (ChatGPT Atlas, Perplexity Comet) where the agent loop runs server-side, we decided early on to run our agent entirely on your machine (client side).
But building everything on the client side wasn't smooth. We initially built our agent loop inside a Chrome extension. But we kept hitting walls -- service worker being single thread JS; not having access to NodeJS libraries. So we made the hard decision 2 months ago to throw away everything and start from scratch.
In the new architecture, our agent loop sits in a standalone binary that we ship alongside our Chromium. And we use gemini-cli for the agent loop with some tweaks! We wrote a neat adapter to translate between Gemini format and Vercel AI SDK format. You can look at our entire codebase here: https://git.new/browseros-agent
How we give browser access to filesystem: When Claude Cowork launched, we realized something: because Atlas and Comet run their agent loop server-side, there's no good way for their agent to access your files without uploading them to the server first. But our agent was already local. Adding filesystem access meant just... opening the door (with your permissions ofc). Our agent can now read and write files just like Claude Code.
What you can actually do today:
a) Organize files in my desktop folder https://youtu.be/NOZ7xjto6Uc
b) Open top 5 HN links, extract the details and write summary into a HTML file https://youtu.be/uXvqs_TCmMQ
--- Where we are now If you haven't tried us since the last Show HN (https://news.ycombinator.com/item?id=44523409), give us another shot. The new architecture unlocked a ton of new features, and we've grown to 8.5K GitHub stars and 100K+ downloads:
c) You can now build more reliable workflows using n8n-like graph https://youtu.be/H_bFfWIevSY
d) You can also use BrowserOS as an MCP server in Cursor or Claude Code https://youtu.be/5nevh00lckM
We are very bullish on browser being the right platform for a Claude Cowork like agent. Browser is the most commonly used app by knowledge workers (emails, docs, spreadsheets, research, etc). And even Anthropic recognizes this -- for Claude Cowork, they have janky integration with browser via a chrome extension. But owning the entire stack allows us to build differentiated features that wouldn't be possible otherwise. Ex: Browser ACLs.
Agents can do dumb or destructive things, so we're adding browser-level guardrails (think IAM for agents): "role(agent): can never click buy" or "role(agent): read-only access on my bank's homepage."
Curious to hear your take on this and the overall thesis.
We’ll be in the comments. Thanks for reading!
GitHub: https://github.com/browseros-ai/BrowserOS
Download: https://browseros.com (available for Mac, Windows, Linux!)
Comments
Comment by arjunchint 1 day ago
I still don't buy the we needed it to be a whole Browser and not a Chrome Extension argument:
- your interface is still literally a chrome extension side panel
- none of the agentic browsers from the bigger players like Atlas and Comet really took off either
I do think the server side integration is required:
- with rtrvr.ai a ton of users are integrating our web agent chrome extension via Remote MCP from chatgpt.com as well as triggering as an API endpoint remotely. Your implementation is limited to only local connections as I understand.
- the biggest unlock for users is running at scale, so just being able to launch a hundred cloud browsers, do a task, and return results while you do other things. So we see hybrid cloud/local execution as the key unlock for this year
Your workflow pipeline is really cool! Any blog post/summary on how you set it up?
Last year was a lot of technical builders exploring the capabilities, and I am excited for this year of making these agentic browsers useful!
Comment by johnsmith1840 1 day ago
One simple example is an extension can't see cross origin iframes. This means it could never do soemthing like fill out a payment form for you if it's an extension.
Limited computation and action space is another as well as bot detection systems.
For example a javascript method trying to automate something like microsoft word in an iframe will have a tough time because the second you inject code in there they will block you.
Comment by arjunchint 1 day ago
Sounds like a skill issue, our web agent is able to interact with cross origin iframes to for example solve captchas: https://www.youtube.com/watch?v=LD3afouKPYc
We honestly haven't faced any bot detection or blocking issues. Owning the browser layer exposes to you much more detection just look at Comet getting blocked on Amazon etc.
Comment by johnsmith1840 1 day ago
Comment by johnsmith1840 1 day ago
Comment by quarkcarbon279 1 day ago
Comment by felarof 1 day ago
> whole Browser and not a Chrome Extension argument
Both of us are definitely biased to think our own approach is better :)
But without owning the binary, we couldn't shipped today's feature -- Agent with access to your filesystem and being able to run shell commands like Claude Cowork.
> your interface is still literally a chrome extension side panel
Yep, our interface is a chrome extension to make iterating on the UX faster. But it uses a ton of C++ APIs that we expose under `chrome.browseros.*`
> Your workflow pipeline is really cool! Any blog post/summary on how you set it up?
Thanks! We'll look into publishing a blog soon!
Comment by arjunchint 1 day ago
Chrome Extension can also access local files and can also execute LLM generated code in sandboxes
Comment by felarof 19 hours ago
Comment by tekacs 1 day ago
I didn't hear back there, but huzzah, it looks like this is in there. I'm glad to see it!
Comment by felarof 1 day ago
Yes, we expose BrowserOS as an MCP server -- that you can use from claude code, cursor, opencode, etc -- https://docs.browseros.com/features/use-with-claude-code
MCP server works out of box (unlike Chrome DevTools MCP which requires tricky setup).
Comment by jm4 1 day ago
You guys need some marketing help. There’s a lot of potential here, but you don’t do a good job of selling it. Tell me what problems I’m going to be able to solve or what headaches it will eliminate. Can it going into that shitty Canvas app my kids’ school uses, identify outstanding assignments or low grades and send me a daily text summary? Can it automate buying everything on my grocery list and setting up delivery? Or look up flight options, ask me what I want and book it for me? Even better, I’m stuck having to look up international flights for 7 people in three households, get everyone to agree on one and then book them. Please build something that will do that.
Keep at it because this thing is cool!
Comment by felarof 19 hours ago
Thank you for the feedback. Ack, we need to do a better job of marketing.
> How do you plan to monetize it? Our goal is to eventually to sell license for enterprise browsers.
Comment by jm4 17 hours ago
For example, it would be really neat to trigger jobs that perform some task and then make a call to Twilio or something to send an alert. Or some building blocks that tie into my Square account or Amazon account. I want to be able to describe the results I want, but I don’t want to explain how to interact with a particular service and then test that.
I would love to be able to give a prompt like this: “review my item library in Square, identify items that are missing descriptions or are miscategorized, propose the fixes, and confirm with me before making any changes.” That’s an extremely tedious task that requires a lot of clicking and page loads. I hate it and I would pay for your product if you could save me that time.
Or this: “Every month, alert me to any fluctuations in product cost and which items in my Square catalog are affected. Highlight any items where my COGS exceeds 35%. All the invoices are available in my email.” That would be incredibly powerful. Doing this manually can take days.
Comment by felarof 15 hours ago
You could try this use case on on agent builder even today. We also have a scheduled tasks for you to schedule it to run monthly
> Have you ever thought about a marketplace for premade workflows?
We want to do this and are moving towards that! But we first need to make the premade (or user published) workflows very reliable.
Comment by 4b11b4 1 day ago
Comment by felarof 1 day ago
> how is it reliably enforced?
At the chromium level, you have access to every single DOM element and coordinate space around it. So, when a click happens either user or agent, we have a neat way of enforcing required action (either allow it or nullify the click).
We are still at early version. And mostly targeting enterprise sites (like SAP) which don't change that often.
What use case did you have in mind?
Comment by mossTechnician 1 day ago
This sounds interesting, but where would I go to see these guardrails and their implementation? I tried searching in the repository and couldn't find them.
Comment by felarof 1 day ago
What use case did you have? Happy to show a demo of current version we have (you can hit me up on discord or slack -- links available on our repo)
Comment by Johnny_Bonk 1 day ago
Comment by felarof 1 day ago
What angle are you looking at this from? Is it for convenience? Or do you not like terminal UI and need a web-friendly UI for these agents?
Comment by thawab 1 day ago
Comment by felarof 19 hours ago
On top of that, if you want headful mode, you can use our MCP server https://docs.browseros.com/features/use-with-claude-code
Would love to understand your use case! You can hit me up at nithin[at]browseros.com
Comment by devld 19 hours ago
Comment by felarof 16 hours ago
I think the big hurdle is mostly education / shift in mindset. We are so used to doing the task manually that most of us (including me) don't pause to think if I should be doing this or can I give to an agent.
Comment by grigio 1 day ago
Comment by felarof 20 hours ago
But if you want to use our browser in headless and use playwright that would work too! (we are chromium fork)
Comment by rahimnathwani 1 day ago
Comment by felarof 1 day ago
Comment by ivysly 1 day ago
Comment by felarof 1 day ago
We see a future where it’s the main gateway to everything, and where agents live and work alongside you inside the browser. That’s why we call it BrowserOS. :)
Comment by p1necone 1 day ago
Sure regular consumer stuff like social media is webapps (if they're not mobile only), and if you're interacting with like salesforce or a customer support tracker or an issue tracker or something you're likely using a webapp, but the move to mobile devices for most consumer stuff means that people still using PCs are largely power users.
Comment by felarof 1 day ago
Precisely. I think most knowledge work (especially at business) still happens browser. That is the workflow we want to target!
Comment by ripped_britches 1 day ago
Comment by sbsnjsks 1 day ago