How we run Firecracker VMs inside EC2 and start browsers in less than 1s

Posted by gregpr07 1 day ago

Comments

Comment by losteric 4 hours ago

> Plain headless Chromium is easy to detect by websites with anti-bot measures. Plain headless Chromium avoided getting blocked by websites only 2% of the time, according to our stealth benchmark.

> Our browsers avoid blocks 81% of the time on our stealth benchmark, and 84.8% on Halluminate BrowserBench, the highest of any provider.

Seems very unethical, no? Who uses service providers like this? The whole point of anti-bot measures is to get rid of bots - you are not wanted there.

These kinds of services inevitably make the web more human-hostile and expensive. Websites will continue pushing back on automated usage, meaning more hurdles to access content.

No doubt part of why we see this push for verified ID on the web - not just age gating and "protect the children", but also protect sites from bots, and protect ad revenue (not a statement of support; just seems like an obvious higher order effect)

Comment by baby_souffle 2 hours ago

> Who uses service providers like this?

I use change detection to monitor all sorts of websites for changes. Some of my favorite authors don't have RSS. I always set up price monitoring for any big ticket item I'm considering like appliances so I can see how their pricing changes over time. I also use scrapers for websites that don't have an API. I like having all of my purchase history indexed in a database where I can do analysis.

> These kinds of services inevitably make the web more human-hostile and expensive.

I would rather not have to spend more time circumventing stupid bot detection things. I would be more than happy to pay for access to some of this data that I cannot access any other way.. but sure, let's keep burning resources on a cat and mouse game that scrapers will always be able to win.

Comment by arianvanp 1 hour ago

The litmus test here is whether they support https://blog.cloudflare.com/introducing-pay-per-crawl/ out of the box or not

They do not.

Comment by mikeocool 3 hours ago

Whether or not scrapping publically available websites is unethical is probably up for debate. In some cases at least, courts have found it to be legal, even when the site is throwing up technical barriers or issues cease and desists.

What is likely unethical is the fact that they offer residential proxies. The residential providers of those proxies are frequently not aware they’ve been opted in to provide such a service.

Comment by eab- 2 hours ago

> courts have found it to be legal

≠ ethical

Comment by MayCXC 58 minutes ago

I built a similar system for an identity protection service that automated removing PII from directory websites like whitepages. Which was less ethical, stealth browser automation or monetized privacy invasion?

Comment by embedding-shape 4 hours ago

> Seems very unethical, no? Who uses service providers like this? The whole point of anti-bot measures is to get rid of bots - you are not wanted there.

Unethical just because it does something someone else doesn't want? I guess it depends on why and what the intention is. I don't have time to sit 24/7 in front of a computer to get a ticket to some events, does that mean it's unethical for me to use my own bot so I can purchase a ticket to bands I'm a fan of? Probably not. But if I did so for scalping purposes? Then yeah, I'd agree it's unethical.

The whole point of anti-anti-bot measures is to be able to do things even if others don't think that thing should be automated, so from the hacker news audience, I think quite a lot of us have at one point or another engaged in stuff like that. Doing so merely for profits of course stinks, but for you to be able to have a fighting chance against scalpers? Probably OK.

Comment by mystifyingpoi 3 hours ago

> even if others don't think that thing should be automated

It's an interesting thought that can be further explored. Could anything that's considered "unwanted" by a third party considered unethical, if I do it anyway?

If the hotel self-service restaurant has a sign "don't take the food out" and I take 1 apple in my pocket for a snack, is it unethical? Or maybe the sign is just for people that would otherwise take $100 of watermelons out of the cantina daily and try to resell it on the beach.

Comment by turtlebits 3 hours ago

Its unethical because you're intentionally bypassing restrictions. Just because others do it doesn't mean its okay.

If you saw a sign in a store that said "1 per person" or "for registered guests only", would you ignore it?

Comment by orf 3 hours ago

Was Rosa Parks unethical for sitting down on a bus?

The point is that the context matters: both the users context and the context of the restriction. It’s not as clear cut as “ignoring restrictions = bad”.

The restriction itself can be unethical, in the same way that bypassing a restriction can be unethical.

Comment by BoorishBears 43 minutes ago

Woah now, I'm for headless browsers but let's not start comparing any of this to Rosa Parks lol.

The reality is a lot of interesting, trivially harmful to non harmful things are illegal and we still do them anyways.

Comment by windexh8er 2 hours ago

Look at what Google's doing right now with Chrome. On June 30 Chrome will remove the last flag that let uBlock keep working, and there's no workaround. Google says it's about security and performance, but is it? $239 billion in ad revenue last year seems to be the motivational factor. The "restriction" is a rule written by the company that profits when you can't block its ads, dressed up as protecting you. But... CISA recommends ad blockers as a defense against malware spread through ad networks.

The rules aren't always right and sometimes have unintended consequences. I think a bigger issue than Browser Use is all of the copyrighted material in every LLM. Given that precedent has been set with zero legal consequences, I'm not sure there's much of a leg for you to stand on here.

Comment by embedding-shape 3 hours ago

> Its unethical because you're intentionally bypassing restrictions

I'd still consider why the restriction is there and why I'm thinking of breaking it, before deciding if it's unethical or not.

It depends, basically. Generally I follow the rules and restrictions, but maybe see them more as guidelines or suggestions.

Comment by kube-system 1 hour ago

There are many ethical reasons to bypass restrictions. Colloquially, we just call them exceptions.

There are many valid ethical exceptions for evading anti-bot detections. For example: you are a white hat actor scraping a black hat site. There are hundreds of other plausible examples.

Comment by jamiequint 3 hours ago

You're confusing law with ethics, they are not the same.

Comment by joatmon-snoo 4 hours ago

An example I ran into recently: I wanted to scrape pricing data for used cars, to better inform a friend's decision about what to purchase.

I know there's a relationship between mileage and depreciation, but wanted to have a better sense of what that relationship is to know whether a given car was over or underpriced.

Similarly, if I was pulling that data to build a service of my own to offer to users... is that unethical?

Comment by sroussey 3 hours ago

All of these questions are easily answered by the question: can I run the bot on the same PC I use regularly? If so, then do it there. If not, then don’t do it at all.

Comment by adolph 1 hour ago

> scrape pricing data for used cars

Time was you could get lovely json feeds from every site by iterating the inspector curl statement. Now-a-days you can't even use Selenium without Cloudflare getting grouchy. Last fall had to make my spreadsheet like a cave-person control c, control v. It wouldn't be so bad if the dealer aggregators' coverage was xor, but you have to dedupe listings. Then there is the whole online salespeople who don't show up at the dealership.

Comment by skybrian 3 hours ago

What do you think of Anubis and Cloudflare? If they block your bot, is that unethical?

Seems like doing business with other people should normally be based on mutual consent, not whatever you can get away with technically.

Comment by wnevets 4 hours ago

> Who uses service providers like this?

People who don't want their headless browser to get blocked?

Comment by nateb2022 4 hours ago

> Seems very unethical, no? Who uses service providers like this? The whole point of anti-bot measures is to get rid of bots - you are not wanted there.

I'm familiar with companies automating access to software only accessible via the web with poor/no API support. This is software they pay (usually a lot of money) for, and usually has built in captchas to guard logins. They aren't a large enough customer to ask the removal of these captchas or whitelabelled (just one out of many SaaS tenants), so they simply work around that restriction.

Comment by mystifyingpoi 4 hours ago

> Seems very unethical, no?

I don't think one can judge it ethically without considering the context. Are we talking about mass automated scraping? Or are we talking about me trying to get a good deal by scraping local used car dealership listing once per day for my personal need (just so I don't have to do it manually)?

One of these is strictly more ethical, but both will be blocked by Cloudflare for example. I'd happily use such service in my personal case.

Comment by dagi3d 2 hours ago

Obviously don't know what percentage represents "legit" use cases vs other more morally questionable, but in our case we have a cms where content team can include external links and we need to verify periodically whether those links work or not, which is not as easy as making get requests with a client.

Comment by sillysaurusx 4 hours ago

(I haven't tried this out yet.) My use case would be to take a snapshot of each HN story. This is surprisingly hard, because most websites prevent bots from doing that.

For example, Claude has a lot of trouble reading HN's front page. HN itself is fine, but the moment you ask it to pick out an article, it often chokes. The website has put up a verification captcha, or it's a paywall, etc. Paywalls can be bypassed by reading HN comments and looking for archive links. But those archives often block bots too, so you're back to square one.

Whether it's unethical is an interesting question. I believe I should have the right to do what I want with internet content, as long as I'm not abusive. Merely having a bot isn't abusive. It would be one thing if the bot is hammering a server or vacuuming up training data, but having a bot at all is presently very hard.

This service caught my attention because it could potentially solve the problem I'm running into. Simply taking snapshots of articles that hit HN shouldn't be so hard, but it is. HN sends millions of views to websites; one bot taking a snapshot isn't going to make a difference. I don't think it counts as "unethical" just because we're going against the website owner's wishes. When you post content to the internet, you sign up to share that content with everyone, other than what's denied by robots.txt. If it's not blacklisted by robots.txt, it should be possible for well-behaved bots to access.

I don't expect very many people here to care about the poor bot creators. Most of the bot creators are malicious anyway. But I personally lament the loss of being able to write a program that can process information from the browser in arbitrary ways. You should be able to, yet we're buying into the notion that it's okay for website owners to say "this content is only accessible by approved bots like Google, and everyone else can sod off."

HN proves it doesn't need to be like that. It gets dozens of millions of page views a day, a lot of which is bot traffic. HN only uses captchas for creating accounts or logging in. You're free to scrape any content as long as you respect the crawl delay of 30 seconds specified in robots.txt, and don't try to visit links that perform actions a human would take (like adding things to favorites or voting). That's how the internet should work: just deliver content.

Comment by 3 hours ago

Comment by dist-epoch 2 hours ago

> one bot taking a snapshot isn't going to make a difference

until half of HN users start asking their agent to do the same, to summarize the top HN articles every day

Comment by ge96 3 hours ago

I briefly tried to do his job where it was scraping steam for CS GO skins (think a knife skin for $2,000.00) and yeah trying to find proxy poviders/get around the ip limit... tough one but market for it people paying for the tool (not mine).

Comment by figmert 3 hours ago

Antibot measure also block real users at the slightest change they don't like. Anti-fingerprinting measure? You're a bot. Adblockers? You're a bot.

Comment by __alexs 2 hours ago

There's no ethical consumption of... ad supported content.

Comment by cute_boi 4 hours ago

Exactly these crappy companies like browser use is causing more captcha etc.. All these scraper companies should've been regulated heavily. They use residential proxy creating incentive for hacking IOT devices etc..

Comment by stogot 4 hours ago

I wish simpler bots existed for consumers. I want to know when someone replies to me, when a price drops, when airlines open new seat reservations, when a new seat opens for a college class, when a concert is coming to my area for a musician I listen to, when my local grocer has new stock, when a new Hyatt offer is available in a city I want to visit, etc. doesn’t mean I’m abusive. I can have it check once a day. In almost all those cases, I want to spend money with the business but I don’t want to manually check

Comment by hollerith 2 hours ago

The people who've been in charge of the web (i.e., mostly the browser makers, but also the owners of the most popular sites) have made decisions that are IMHO severely anti-user. Although these anti-user design decisions have been accumulating for 30 years, users have had no alternative because all the content was on the web with way to get it other than to visit web sites with a web browser.

Now that there is an alternative (namely AI) people (including me) are flocking to the alternative. You want frame this as unethical bots versus ethically-acceptable human site visitors, but the main motivation for the use of scraping bots these days is to provide services (i.e, AI-based question answering) that users (like me) consider far superior to going directly to web sites for information because visiting web sites with a web browser is a frustrating tedious experience.

Comment by ranger_danger 4 hours ago

Web archival/preservation services/projects that need to get past captchas and other bot checks are a prime target for a service like this... but I think their main customers are people just mass scraping parts of the internet for less altruistic reasons.

Comment by zuzululu 3 hours ago

Once again I'd like to remind that violating Terms of Service isn't the same as violating some moral ethics. They are literally just expectations with no enforceable or legal boundaries.

For example I could write in my Terms of Service that you do not view more than one page on my website and expect you to send me a written permission to read the rest. I don't expect anybody to follow and I sure don't think less of those that do.

The push for verified IDs is not related to this, its more of a politically motivated attempt at selling fear to justify more surveillance.

Comment by sudb 1 hour ago

Something elided here is that nested virtualization on regular EC2 instances has only been possible since February this year[1] - before this, you had to use a metal EC2 instance to run Firecracker VMs.

1. https://aws.amazon.com/about-aws/whats-new/2026/02/amazon-ec...

Comment by gregpr07 1 hour ago

Yeah pretty new stuff - official it’s still not recommended but works really well so far! Finally we don’t have to run baremetal

Comment by hobofan 2 hours ago

I'm a bit surprised that with all this, they still stuck with Chromium.

We have a much less sophisticated setup in our web-access MCP server[0] where browser instances are spawned as subprocesses and the biggest win in stability, CPU and memory usage we had was in switching from Chrome to Lightpanda[1].

Fitting to the statement at the end of the article, the faster browser to boot might be one that allocates less memory in general.

[0]: https://github.com/EratoLab/web-access-mcp

[1]: https://lightpanda.io

Comment by Reformedot 2 hours ago

We decided to maintain Chromium as engine for stealth purposes.

Browsers like LightPanda lack stealth at all, they are trivial to detect. There are ways to make Chromium more performant, by removing everything that you don't need.

We believe that Chromium can reach that performance without starting an entire engine from scratch, and without losing stealth, a top priority for us.

The language is not the problem, C++ is as performant as Zig, but Chromium bloat is huge, agree on that.

Comment by smnscu 18 minutes ago

Firecracker is fantastic technology. I'm using it for my interviewing startup to run isolated runtimes for coding interviews (and personal workspaces), and it's been rock solid and incredibly lightweight. Interfacing with it through the Go SDK has been a piece of cake, too.

Comment by cgijoe 1 hour ago

> Next: skip Chromium startup > This is complex, as a running browser has open devices, timers, graphics state, network state, and fingerprint state.

Hmm, can't you just keep a set of browsers already running, like a warm pool, ready to assign to an incoming request? The latency would be close to zero for the user. You'd need some prediction logic to expand / contract the warm pool based on traffic patterns, but that seems like the easiest solution to me.

Comment by Reformedot 1 hour ago

Yes, warm pool work, but our goal is to replace them at all.

Warm pools are nice but at the end they also consume resources, And you need to always keep the pool warm, starting browsers to balance, etc...

With the upcoming changes we will keep Chromium startup and the VM will be ready in 50ms, defeating warm pools at all

Also some customers need special parameters and features, increasing warm pools complexity. The happy path will be fast but the edge case will be extremely slow , and we want to guarantee fast speeds to matter which features you need on the requested browser.

Comment by timojeajea 2 hours ago

We run a screenshot API (ApiFlash) with Chromium packaged in an AWS Lambda container image instead of Firecracker on EC2. AWS Lambda gives you the isolation and autoscaling for free which is ideal for spiky stateless work like screenshots. I believe we get mostly the same benefits compared to browser-use solution but with a much much simpler architecture. The tradeoff is the AWS lambda cold starts, but in practice sequential AWS Lambda invocations actually reuse a hot function. As a result, with a large enough volume, spikes are smoothed and cold starts are not that frequent.

Comment by Reformedot 1 hour ago

Not all use cases require all the features that we built

Few issues we had with lambas: - Limited running time (15 min), we support up to 4 hours (we can run longer if needed) - Price - Lack of snapshotting mechanisms - Lack of low-level control over the running host

But yeah, lambda is way more than enough for most common use cases automating the web

Comment by epolanski 1 hour ago

Your solution sounds very expensive.

Comment by timojeajea 1 hour ago

From our production stats, a median screenshots capture is 5.7s. Browser-use bills per minute, not per millisecond like lambda does. As is, it's around 2x more expensive than Lambda for our use-case.

Comment by Reformedot 17 minutes ago

Fair. We bill by minute cause our main use case is web automation. If you compare per minute, Lambdas are 4-6x more expensive than our solution

Comment by amarshall 28 minutes ago

Pretty light on details, heavy on fluff. 9.8s to 3.1s was userfaultd + hugepages, 500ms was PS/2 mouse and… where is the rest of the time to get to 400ms?

Comment by SomaticPirate 1 hour ago

What is firecracker needed? Couldn’t this just run in a container directly? I understand some of the isolation concerns but a browser and container breakout is a billion dollar CVE, no?

Comment by WhyNotHugo 10 minutes ago

You can take a snapshot of a microVM and roll back. I've never heard of this being done with containers.

Comment by arianvanp 1 hour ago

If you follow the kernel mailing list container breakout exploits are currently a weekly occurrence

Comment by gregpr07 1 hour ago

Oh really, not a security expert, but could you send me some examples?

Comment by CompuIves 4 hours ago

Very cool to see more use of userfaultfd, really powerful API because you can fully control how and from where memory is loaded during a pagefault.

Comment by rbbydotdev 4 hours ago

> The catch is that regular EC2 is already a VM. AWS runs our host inside its own isolation layer, and then we run browser VMs inside that host. In other words, every browser is a VM inside a VM.

yes but i think there is specifically some ec2s which give you hypervisor access and thereby firecracker too - someone correct me if im wrong?

Comment by roboben 4 hours ago

yes only c8i, m8i and r8i instance types support it. It is called nested virtualization[1]

[1] https://aws.amazon.com/about-aws/whats-new/2026/02/amazon-ec...

Comment by thundergolfer 4 hours ago

Unfortunately supply is quite limited. If you want to horizontally scale on these instances you need to have a good relationship with AWS so they'll give you a big allocation before c9i is a thing.

Comment by roboben 4 hours ago

also i found them much less stable than metal instances running into weird kvm failures

Comment by Reformedot 4 hours ago

Yes, it is. It was a challenge to make it work smooth without metal. The scaling out speed was one of the main reasons

Comment by torginus 4 hours ago

When we had need of quite big machines (AWS metal instances), we've found the performance differential between metal, and the equivalent size VM was 10-20% for CPU heavy workloads.

Comment by Dibby053 2 hours ago

No mention of the tools/methods used to do the profiling, I think that would be the most interesting part.

Also a bit surprising that a checkpoint with the browser running wouldn't just work. Is this some quirk of firecracker?

Comment by Reformedot 2 hours ago

Checkpoint with Chromium running is possible and will be our next step.

Main blockers right now is fingerprint injection and profile injection, solved already.

It's always a balance of engineering effort & gains. Post-Chromium snapshot let's us save 200ms, which is not that important for 99% of use-cases, but that will come soon since it brings some other benefits (like CPU footprint)

Profiling and tools used are already included with Chromium, they provide nice debugging tools

Comment by messh 2 hours ago

Just use something like https://shellbox.dev instead of FireCracker inside ec2. Much simpler, boxes are up in a couple of seconds, and it is way cheaper.

Comment by Reformedot 2 hours ago

It's not cheaper, slower startup, we lose full control and the environment is not optimized to run Chromium, so we also lose performance

Comment by sandGorgon 2 hours ago

have you tried running android browsers ? we run RL workloads using android browsers. We are having to maintain a fork of https://github.com/budtmo/docker-android/ and android chrome on top. We would rather use browser-use if it had that support.

P.S. we do maintain our fork of a browser for rubric computation...but that is not relevant for this. The infrastructure is what we are looking for.

Comment by Reformedot 25 minutes ago

I've experimented with Android Browsers. The problem is that android VMs are super heavy compared to the resources needed to run just Chromium

Startups are absurdly slow, isolation is harder, etc...

Android bloat is insane, you need to run the entire Java VM to start the browser... It's also harder to fingerprint, and at scale that's something that we need for Browser Use

Cool experiment but not yet production ready, at least for us

Comment by tauntz 2 hours ago

Shameless plug, we have the infra for exactly that use-case. Reach out if you're interested. Email in profile.

Comment by andrewstuart 2 hours ago

Why do this? I can’t see that it would be better than running chrome.

Comment by GrinningFool 3 hours ago

The Internet is drowning in bots, everyone who hosts a site or service is paying the price. At least we have companies like this to make the problem worse.

Comment by embedding-shape 3 hours ago

You have to be a bit more restrictive today yes, but if you weren't already overrun with bots and hacking attempts while hosting a public service many years ago, you probably weren't hosting a even medium-popular website in the first place. Same thing goes today. Slap a rate limit on it and be done with it.

Comment by GrinningFool 2 hours ago

Yes, externalize the expense of dealing with it to each operator.

Comment by embedding-shape 2 hours ago

Been like that since what, early 2000s?

Comment by GrinningFool 1 hour ago

It... wasn't okay then. And now it's scaled up immensely.

Comment by swazzy 3 hours ago

> During a burst in traffic, the system, instead of reacting on its own, required humans to adjust it.

Isn't this solvable with autoscaling? how is this not an issue with Firecracker as well?

Comment by Reformedot 2 hours ago

Our previous solution (Unikraft) did not supported auto-scaling

That's why we moved to a fully in-house solution with Firecracker and auto-scaling on EC2

Comment by gozzoo 4 hours ago

The article doesn't mention docker at all. I don't understand why containers are not viable solution for headless browsers.

Comment by simonreiff 2 hours ago

Docker doesn't provide any security. You install Docker on your local laptop, and the container you spin up when you execute `docker run` interacts with your laptop's kernel directly. It provides logical isolation between containers but provides zero protection for your host kernel (assuming you decide to install Docker on a remote server instead).

Firecracker provides an isolation between the host kernel, on the one hand, and the guest microVM, on the other hand. So on AWS, you use an Amazon Machine Image (AMI) to specify the OS and other components and libraries installed on an EC2 server such as c5.metal, or if you're using nested virtualization, you can use c8i, s8i, or m8i instances at a discount of about 80%-90% at some performance and other cost, and you bundle Linux along with the Firecracker binary. Then you compile a build artifact including `rootfs` for the Firecracker baked image which is the microVM image (analogous to a Docker image that results from executing `docker build`). But the microVM process has its own virtual kernel and is a guest on the host machine. So for instance, you can place Docker inside the microVM, then the container is executing against the microVM kernel, not the host EC2 kernel. Communication is achieved securely between the two using `vsock` and probably something like `socat` so that data travels, say, from guest RAM to host RAM directly to an S3 quarantine bucket, for instance, without ever touching the host's kernel or filespace.

Comment by mike-grant 2 hours ago

So I've been playing and tweaking for a while with running different browsers in containers. And it took a long while to get working well, but it's doable.

The only issue is scaling, the containers aren't super quick to start (so we keep a spare container ready) and there's plenty of other issues. Also docker isn't really a security boundary so there's issues and concerns there.

Comment by kevmo314 4 hours ago

Their competitive advantage is not so much running the browser but rather making the browser undetectable.

They boast a large residential proxy network too, which tells you all you need to know.

Comment by sroussey 3 hours ago

Yeah, where is the blog post on the residential network?

Comment by torginus 4 hours ago

Or processes. Chrome has builtin process isolation for every browser tab. It starts up darn near instantly, and scores as 'pretty good' as far as sandboxing is concerned.

Comment by Reformedot 4 hours ago

Docker does not isolate, consumes more resources and is slower

Comment by dizhn 3 hours ago

Startup time probably. They can start firecracker from a snapshot state.

Comment by roboben 4 hours ago

docker is not a security boundary but a resource boundary.

Comment by cute_boi 4 hours ago

It is security boundary but a weak one. Escaping from docker is very hard.

Comment by rvz 6 minutes ago

> Escaping from docker is very hard.

You mean a microVM.

A docker LPE (local privilege escalation) requires a kernel exploit such as Copyfail would work under docker but not in a microVM.

Comment by wewewedxfgdf 3 hours ago

But Firecracker is not compatible with GPU for Chrome, is that right?

That means Chrome is slow - quite the tradeoff.

Comment by Reformedot 3 hours ago

Our browsers beat competitors in performance too. Chrome uses mainly CPU, not GPU

We support GPU via software tho

Comment by rbbydotdev 4 hours ago

crazy that the maker of chrome(google) and also the owner of a massive amount of cloud services has not made a cloud product identical to this yet

Comment by bfeynman 3 hours ago

they kind of do.. gcp has their lambda equivalent which i believe comes with chromium preinstalled, its how major search tools like jina work, sure thre problaby somethign about session management that they probably neuter to prevent abuse though

Comment by _pdp_ 3 hours ago

not google but cloudflare has a similar product - though I am not sure how good it is

Comment by ranger_danger 3 hours ago

They have IMO: https://web.archive.org/web/20180823072111/https://cloud.goo...

They just don't have access to giant pools of residential IPs, so too many sites end up blocking all the cloud providers by IP range/ASN anyway, even if they could get through a captcha.

Comment by nickphx 1 hour ago

google has a large amount of "caching servers (GGC)" located in data centers for residential providers all over the world.. They use these servers for a variety of services.. Most of the traffic I have seen from them have been for their "URL preview" service ..

Comment by andrewstuart 2 hours ago

How many tabs do you use per server?

Comment by Reformedot 19 minutes ago

Really depends on the server specs. Tab amount relies entirely in memory & CPU availability, not in the infra that runs behind the scenes

But yeah, in one server we can fit hundreds of browsers, or even thousands if we use bigger servers. And each one of them with dozens of tabs, no issue

Comment by latchkey 2 hours ago

Just hot stage a bunch of VMs and then there is no startup time. Every time someone finishes, just start another one and leave it running waiting for the next customer.

Comment by Reformedot 2 hours ago

Browsers can't be reused between customers. They contain sensitive and private data. Everything needs to be isolated and ephemeral.

Comment by latchkey 57 minutes ago

I never suggested reuse.

Comment by Reformedot 22 minutes ago

Starting the VM itself takes 20ms with Firecracker, the slowest part is starting the browser.

So there's no benefit on reusing the VM but not the browser. VM isolation is also important, customers can leave downloads and other files that should not be accessible for freshly created browsers on that same VM.

Comment by latchkey 12 minutes ago

I never suggested reuse.

Comment by jauntywundrkind 3 hours ago

I love that they start no no core pinning, then switch-over to having cores pinned.

This could be a bit of a tricky one, but I'd expect Checkpoint Restore In Userspace eventually tackles a lot of this. An image of a running Chromium process on a tmpfs (in-memory filesystem) that can just be launched endlessly tackles the memory slowdown problem, eliminates conventional startup costs. This feels like an ideal CRIU use case.

I imagine there's a lot of things Chrome needs to run though, bits of state to save/restore.

Comment by rfoo 2 hours ago

Assuming CRIU can checkpoint and restore Chrome, and especially recent versions of Chrome, just fine, is a little bit of stretch.

Comment by stogot 4 hours ago

How do you handle browser sessions?

Comment by Reformedot 4 hours ago

We persist profiles to maintain sessions if needed, this includes cookies, session storage and everything needed to keep your account logged in

Comment by zane_shu 59 minutes ago

[flagged]

Comment by huflungdung 1 hour ago

[dead]

Comment by eptcyka 4 hours ago

[flagged]

Comment by Rasbora 2 hours ago

[flagged]

Comment by basilikum 8 minutes ago

Proxy detection != Bot detection

You seem to be affiliated to that clearly slopped service. Your rate of comments seems to be low enough to make it plausible that you're not just shilling, but could you at least disclose your relationship to that product whenever you mention it?

Comment by Reformedot 2 hours ago

> Browser stealth has been a solved problem for some time now

That's not true. Bots can still automate the web and there's demand for products that allow it. It's harder than years ago, but not impossible.

Defenders are always in favor, but the demand for automating the web exists, so research keeps going. There are ways to hide everything, including residential proxies.

For reference, I'm the blog author, and I have another one talking about this topic: https://browser-use.com/posts/bot-detection

Comment by iancarroll 2 hours ago

Your whole account is undisclosed marketing for this service. Fingerprinting in this manner is highly unlikely to be viable - there are too many middleboxes at the TCP layer to try and fingerprint on it.

Comment by gregpr07 1 hour ago

Assuming you run on a mobile proxy - how would you even do that? 1 IP is shared across (potentially) thousands of phones?

Comment by ericpauley 2 hours ago

This only works if it's a proxy instead of an IP-layer tunnel. I suppose you could go a step further and validate whether the total ping is realistic, but this could be trivially solved by putting C2 close to the res-proxy endpoint.

Comment by nisten 4 hours ago

fancy terms aside... they likely just run alpine linux.

Comment by fsuts 1 day ago

“ click this button, type this text, read this page, take this screenshot.”

You left in the Ai’s instructions. lol

Interesting read though, thanks

Comment by gregpr07 1 day ago

well that's how browser agents work in a nutshell lol