Agent Safehouse – macOS-native sandboxing for local agents

Posted by atombender 2 days ago

Comments

Comment by e1g 1 day ago

Creator here - didn't expect this to go public so soon. A few notes:

1. I built this because I like my agents to be local. Not in a container, not in a remote server, but running on my finely-tuned machine. This helps me run all agents on full-auto, in peace.

2. Yes, it's just a policy-generator for sandbox-exec. IMO, that's the best part about the project - no dependencies, no fancy tech, no virtualization. But I did put in many hours to identify the minimum required permissions for agents to continue working with auto-updates, keychain integration, and pasting images, etc. There are notes about my investigations into what each agent needs https://agent-safehouse.dev/docs/agent-investigations/ (AI-generated)

3. You don't even need the rest of the project and use just the Policy Builder to generate a single sandbox-exec policy you can put into your dotfiles https://agent-safehouse.dev/policy-builder.html

Comment by atombender 1 day ago

OP here. Sorry if this was premature. I came across it through your earlier comment on HN, started using it (as did a colleague), and we've been impressed enough with how efficient it is that I decided it deserved a post!

I've seen sandbox policy documents for agents before, but this is the first ready-to-use app I've come across.

I've only had a couple of points of friction so far:

- Files like .gitconfig and .gitignore in the home folder aren't accessible, and can't be made accessible without granting read only access to the home folder, I think?

- Process access is limited, so I can't ask Claude to run lldb or pkill or other commands that can help me debug local processes.

More fine-grained control would be really nice.

Comment by e1g 1 day ago

Love the feedback -

For handling global rules (like ~/.gitconfig and ~/.gitignore), I keep a local policy file that whitelists my "shared globals" paths, and I tell Safehouse to include that policy by default. I just updated the README with an example that might be useful[1]. I also enabled access to ~/.gitignore by default as it's a common enough default.

For process management, there is a blurry line about how much to allow without undermining the sandboxing concept. I just added new integrations[2] to allow more process control and lldb, but I don't know this area well. You can try cloning the repo, asking your agents to tweak the rules in the repo until your use-case works, and send a PR - I'll merge it!

Alternatively, using the "custom policy" feature above, you can selectively grant broad access to your tools (you can use log monitoring to see rejections, and then add more permisions into the policy file)

[1] https://github.com/eugene1g/agent-safehouse?tab=readme-ov-fi...

[2] https://github.com/eugene1g/agent-safehouse/pull/7

Comment by atombender 1 day ago

That is very useful. I wasn't sure if I could supply my own override list or how I would even format one, but this solves that problem!

The process control policy, that's kind of niche and should definitely not be something agents are always allowed to do, so having a shorthand flag like you added in that pull request is the right choice.

I'm sure Anthropic and the other major players will catch up and add better sandboxing eventually, but for now, this tool has been exactly what I needed — many thanks!

I also wonder if this could have be a plugin or MCP server? I was using this plugin [1] for a bit, and it appears to use a "PreToolUse" that modifies every tool invocation. The benefit here would be that you could even change the Safehouse settings inside a session, e.g. turn process control on or off.

[1] https://mksg.lu/blog/context-mode

Comment by indeyets 1 day ago

Doesn’t that defeat the purpose? You want to control it from outside of the sandbox, not to give agent escape hatch from sandbox

Comment by atombender 1 day ago

This would be slash commands that the agent itself wouldn't be able to do, and which would communicate with the plugin via a side channel the agent wouldn't know about. Admittedly I don't know much about the plugin interface in Claude Code, though.

Comment by TheBengaluruGuy 1 day ago

I'm wondering if this could be adapted for openclaw. Running it in a machine that's accessible reduces friction and enables a lot of use-cases but equally hard to control/restrict it

Comment by ai_fry_ur_brain 1 day ago

Just dont use openclaw, you dont need it.

Comment by bouke 1 day ago

I've read through the agent investigation of Codex on macOS. It looks like the default sandbox is pretty limited, however it doesn't match my experience:

- I asked the agent to change my global git username, Codex asked my permission to execute `git config --global user.name "Botje"` and after I granted permission, it was able to change this global configuration.

- I asked it to list my home directory and it was able to (this time without Codex asking for permission).

Comment by asabla 1 day ago

Oh woah!

I've been trying to get microsandbox to play nicely. But this is much closer to what I actually need.

I glimpsed through the site and the script. But couldn't really see any obvious gotchas.

Any you've found so far which hasn't been documented yet?

Comment by e1g 1 day ago

Pure TUI is solid - I’ve been running all my pets inside that cage for several weeks with no issues. Auto-updates work, session renewals work, config updates work etc.

But lately I’ve been using agents to test via browsers, and starting headless browsers from the agent is flakey. I’m working on that but it’s hard to find a secure default to run Chrome.

In the repo, I have policies for running the Claude desktop app and VSCode inside the same sandbox (so can do yolo mode there too), so there is hope for sandboxing headless Chrome as well.

Comment by asabla 1 day ago

Yee I gotcha.

Did a migration myself last week from using playwright mcp towards playwright-cli instead. Which has been playing much nicer so far. I guess you would run into the same issues you've already mentioned about running chrome headless in one of these sandboxes.

I'll for sure keep an eye out for updates.

Kudos to the project!

Comment by e1g 1 day ago

playwright-cli works out of the box, and I just merged support for agent-browser. If you end up testing out Safehouse, and have any issues, just create an issue on GitHub, and I'll check it out. Browser usage is definitely among my use cases.

Comment by pizlonator 1 day ago

Just wanted to say, this is very cool even (and especially) if it's so simple.

Thanks for making it!

Comment by siwatanejo 1 day ago

It's kinda funny that I, being skeptical about coding agents and their potential dangers, was interested to give your project a go because I don't trust AI.

Yet the first thing I find in your README is that to install your tool I need to trust some random server serve me an .sh file that I will execute in my computer (not sure if with sudo... but still).

Come on man, give me a tarball :)

EDIT: PS: before someone gives me the typical "but you could have malware in that tarball too!!!", well, it's easier to inspect what's inside the tarball and compare it to the sources of the repo, maybe also take a look at the CI of the repo to see if the tarball is really generated automatically from the contents of the repo ;)

Comment by e1g 1 day ago

Fair! You don’t actually need to install anything and can just generate a text file with the security profile for sandbox-exec. You can do that online at https://agent-safehouse.dev/policy-builder.html

Alternatively, you can feed these instructions to your LLM and have it generate you a minimal policy file and a shell wrapper https://agent-safehouse.dev/llm-instructions.txt

Comment by camkego 1 day ago

I think if the online builder could have been the whole project, that would be neat! Truly "zero-trust", what I think many HN readers want.

Anyway, thanks for building Agent Safehouse.

Comment by e1g 1 day ago

That’s a great idea. I think I’ll restructure the entire project to be based around a collection of community managed rules, a UI generator to build a custom text file from those rules, and an LLM skill so people can evolve their policies themselves. The Bash script will remain in the background as one implementation, but shouldn’t be the only way.

Comment by oneplane 1 day ago

That online builder is very cool, well done!

I've been trying out similar things to help internal teams to use systems and languages like Rego (for Open Policy Agent) to have a visual and more 'a la carte' experience when starting out, so they don't have to jump straight to learning all syntax and patterns for a language they might have never seen before.

Comment by e1g 1 day ago

Thanks, Codex helped to put that together in like 20 minutes. Try feeding your agent the idea about an interactive config builder, give it the upstream URL with your condos, and see if it can whip up something for you.

Comment by chrisweekly 1 day ago

condos?

Comment by dummydummy1234 1 day ago

Really like the online builder!

Comment by Quiark 1 day ago

Usually it takes less than 5 minutes to review the shell script that downloads stuff.

Comment by aa-jv 1 day ago

Do you review every package in your package manager for back doors/trojans - or do you rely on the social circle upstream to do this work for you?

How is this any different than running some random .sh script?

The assumption is that package-manager code is reviewed - that same assumption can be applied just as equitably to wget'ed .sh files.

tl;dr - you are reviewing everything you ever run on your system, right?

Comment by quietsegfault 1 day ago

What’s the difference between running natively and in a container, really?

Comment by cortesoft 1 day ago

On Linux, not much. On a Mac, quite a bit.

Comment by quietsegfault 1 day ago

Like mostly apple services such as iMessage? I’m asking honestly, not snarky! I don’t think performance is a big factor for agentic hyjinx.

Comment by scosman 1 day ago

Apple APIs yes. But there’s also an overhead when running containers like docker on Mac (and windows). Only Linux has near-zero overhead.

Comment by quietsegfault 1 day ago

Right, because on Mac (and windows) you’re running a VM rather than just setting up kernel namespaces. How cpu and network intensive are these pets? Or is it more of a principle thing, which I totally understand?

I prefer containerization because it gives me a repeatable environment that I know works, where on my system things can change as the os updates and applications evolve.

But I can understand the benefit of sandboxing for sure! Thank you.

Comment by scosman 1 day ago

very roughly: not that bad but not zero. I see docker taking a continuous 1/2% CPU on MacOS when running its host, where sandbox-exec or containers on linux are zero unless used.

If you prefer containers, use containers.

Comment by sunnybeetroot 1 day ago

Yes, anything Apple platform development

Comment by dionian 1 day ago

i toyed around with policy builder for a few seconds, i was really impressed. great UX

Comment by paxys 1 day ago

Not sure I understand this. Agent CLIs already use sandbox-exec, and you can configure granular permissions. You are basically saying - give the agents access to everything, and configure permissions in this second sandbox-exec wrapper on top. But why use this over editing the CLI's settings file directly (e.g. https://code.claude.com/docs/en/sandboxing#configure-sandbox...)?

Comment by scosman 1 day ago

I have sandbox-exec setup for Claude like you suggest, but I’m not sure every CLI supports it? Claude only added it a month or two ago. A wrapper CLI that allows any command to be sandboxed is pretty appealing (Claude config was not trivial).

The downside is that it requires access to more than it technically needs (Claude keys for example). I’m working on a version where you sandbox the agent’s Bash tool, not the agent itself. https://github.com/Kiln-AI/Kilntainers

Comment by snthpy 8 hours ago

I like the idea but not the MCP part.

How about using bash-tool to intercept the commands and then passing them onto the containers?

https://github.com/vercel-labs/bash-tool

Comment by scosman 6 hours ago

That's exactly what it does -- the bash commands are passed into the containers. It also manages container lifecycle (starting on first request, cleanup on connection shutdown).

If you're using an agent tool that already includes an existing bash tool which calls host OS, just remove that one and add this.

Comment by snthpy 7 hours ago

My bad, looks like I misunderstood how bash-tool works.

Then how about running Claude Code or your harness of choice inside bubblewrap with a shim/stub for the base binary?

https://github.com/containers/bubblewrap

Comment by bootlooped 1 day ago

I've had trouble with the sandbox functionality baked into agents being able to do what I want, particularly Gemini CLI. Being able to write your own .sb file is more powerful and portable.

Claude Code seemed to be able to reach outside its own sandbox sometimes, so I lost trust in it. Manually wrapping it in sandbox-exec solved the issue.

Comment by hmokiguess 1 day ago

I think the idea here is to move the responsibility layer away from the agent, rather than trust the CLI will behave and have to learn specific configs for each (given OP's tool works for any agent, not just Claude), this standardizes and centralizes it.

Comment by zmmmmm 1 day ago

This is great to see.

I honestly think that sandboxing is currently THE major challenge that needs to be solved for the tech to fully realise its potential. Yes the early adopters will YOLO it and run agents natively. It won't fly at all longer term or in regulated or more conservative corporate environments, let alone production systems where critical operations or data are in play.

The challenge is that we need a much more sophisticated version of sandboxing than anybody has made before. We can start with network, file system and execute permissions - but we need way more than that. For example, if you really need an agent to use a browser to test your application in a live environment, capture screenshots and debug them - you have to give it all kinds of permissions that go beyond what can be constrained with a traditional sandboxing model. If it has to interact with resources that cost money (say, create cloud resources) then you need an agent aware cloud cost / billing constraint.

Somehow all this needs to be pulled together into an actual cohesive approach that people can work with in a practical way.

Comment by andybak 1 day ago

> solved

Have you considered that it's unsolvable? Or - at least - there is an irreconcilable tension between capability and safety. And people will always choose the former if given the choice.

Comment by zmmmmm 1 day ago

in a pure sense no, it's probably not solvable completely. But in a practical sense, yes, I think it's solvable enough to support broad use cases of significant value.

The most unsolvable part is prompt injection. For that you need full tracking of the trust level of content the agent is exposed to and a method of linking that to what actions it has accessible to it. I actually think this needs to be fully integrated to the sandboxing solution. Once an agent is "tainted" its sandbox should inherently shrink down to the radius where risk is balanced with value. For example, my fully trusted agent might have a balance of $1000 in my AWS account, while a tainted one might have that reduced to $50.

So another aspect of sanboxing is to make the security model dynamic.

Comment by skybrian 1 day ago

I don't know about solved, but I've seen some interesting ideas for making it safer, so I think it could be improved.

One idea is to have the coding agent write a security policy in plan mode before reading any untrusted files:

https://dystopiabreaker.xyz/fsm-prompt-injection

Comment by schmuhblaster 1 day ago

I am experimenting [0] with compiling markdown to a DSL first. Then running a static analysis on the DSL code. Still at an early stage though.

[0] https://deepclause.substack.com/p/static-taint-analysis-for-...

Comment by silverstream 1 day ago

File-level sandboxing is table stakes at this point — the harder problem is credentials and network. An agent inside sandbox-exec still has your AWS keys, GitHub token, whatever's in the environment. I've been running a setup where a local daemon issues scoped short-lived JWTs to agent processes instead of passing raw credentials through, so a confused agent can't escalate beyond what you explicitly granted. Works well for API access. But like you said, nothing at the filesystem level stops an agent from spinning up 50 EC2 instances on your account.

Comment by e1g 1 day ago

> An agent inside sandbox-exec still has your AWS keys, GitHub token, whatever's in the environment.

That's not the case with Agent Safehouse - you can give your agent access to select ~/.dotfiles and env, but by default it gets nothing (outside of CWD)

Comment by ericlevine 1 day ago

Completely agree. As soon as I had OpenClaw working, I realized actually giving it access to anything was a complete nonstarter after all of the stories about going off the rails due to context limitations [1]. I've been building a self-hosted open sourced tool to try to address this by using an LLM to police the activity of the agent. Having the inmates run the asylum (by having an LLM police the other LLM) seemed like an odd idea, but I've been surprised how effective it's been. You can check it out here if you're curious: https://github.com/clawvisor/clawvisor clawvisor.com

[1] https://www.tomshardware.com/tech-industry/artificial-intell...

Comment by zmmmmm 1 day ago

Every post from this two day old account starts with about 8 words and then an em-dash. And it happens to self-identify a startup building infra for OpenClaw.

Comment by simonw 1 day ago

The challenge I'm finding with sandboxes like this is evaluating them in comparison to each other.

This looks like a competent wrapper around sandbox-exec. I've seen a whole lot of similar wrappers emerging over the past few months.

What I really need is help figuring out which ones are trustworthy.

I think this needs to take the form of documentation combined with clearly explained and readable automated tests.

Most sandboxes - including sandbox-exec itself - are massively under-documented.

I am going to trust them I need both detailed documentation and proof that they work as advertised.

Comment by e1g 1 day ago

Thank you for your work - I have sent many of your links to my people.

Your point is totally fair for evaluating security tooling. A few notes -

1. I implemented this in Bash to avoid having an opaque binary in the way.

2. All sandbox-exec profiles are split up into individual files by specific agent/integration, and are easily auditable (https://github.com/eugene1g/agent-safehouse/tree/main/profil...)

3. There are E2E tests validating sandboxing behavior under real agents

4. You don't even need the Safehouse Bash wrapper, and can use the Policy Builder to generate a static policy file with minimal permissions that you can feed to sandbox-exec directly (https://agent-safehouse.dev/policy-builder). Or feed the repo to your LLMs and have them write your own policy from the many examples.

5. This whole repo should be a StrongDM-style readme to copy&paste to your clanker. I might just do that "refactor", but for now added LLM instructions to create your own sandbox-exec profiles https://agent-safehouse.dev/llm-instructions.txt

Comment by big_toast 1 day ago

I love this implementation. Do you find the SBPL deficient in any ways?

Would xcodebuild work in this context? Presumably I'd watch a log (or have an agent) and add permissions until it works?

Comment by e1g 1 day ago

SBPL is great for filesystem controls and I haven’t hit roadblocks yet. I wish it offered more controls of outbound network requests (ie filtering by domain), but I understand why not.

Yes, Safehouse should work for xcodebuild workloads in the way you described - try to run it, watch for failures, extend the profile, try again. Your agent can do this in a loop by itself - just feed it the repo as there are many integrations that are not enabled by default that will help it.

Comment by big_toast 1 day ago

For anyone reading this later.

I read a little from sandvault and they suggest sandbox-exec doesn't allow recursive sandboxing, so you need to set flags on xcodebuild and swift to not sandbox in addition to the correct SBPL policy.

(I don't think sandvault has a swift/xcode specific policy because they're dumping everything into a sandvault userspace. And it doesn't really concern itself with networking afaict either.)

Comment by e1g 14 hours ago

Yes, you're correct about 'no nested sandboxing'.

This also applies to sandboxing an Electron app: Electron has its own built-in sandboxing via sandbox-exec, so if you're wrapping an Electron app in your own sandboxing, you have to disable that inner sandbox (with Electron's --no-sandbox or ELECTRON_DISABLE_SANDBOX=1). In the repo, I have examples for minimal sandbox-exec rules required to run Claude Code[1] and VSCode[2] (so you can do --dangerously-skip-permission in their destop app and VSCode extension)

[1] https://github.com/eugene1g/agent-safehouse/blob/a7377924efa...

[2] https://github.com/eugene1g/agent-safehouse/blob/a7377924efa...

Comment by kstenerud 1 day ago

If you're looking for one better documented and tested, you might like https://github.com/kstenerud/yoloai

Comment by okanesen 1 day ago

I'm having trouble understanding what makes this: "better documented and tested"? Care to elaborate how the testing was done? What are the differences?

Comment by vasco 1 day ago

So create a 'destroy my computer' test harness and run it whenever you test another wrapper. If it works you'll be fine. If it doesn't you buy a new computer.

Comment by xyzzy_plugh 2 days ago

This is just a wrapper around sandbox-exec. It's nice that there are a ton of presets that have been thought out, since 90% of wielding sandbox-exec is correctly scoping it to whatever the inner environment requires (the other 90% is figuring out how sandbox-exec works).

I like that it's just a shell script.

I do wish that there was a simple way to sandbox programs with an overlay or copy-on-write semantics (or better yet bind mounts). I don't care if, in the process of doing some work, an LLM agent modifies .bashrc -- I only care if it modifies _my_ .bashrc

Comment by e1g 1 day ago

Thanks, I picked Bash because I’m scared of all Go and Rust binaries out there!

Re “overlay FS” - I too wish this was possible on Macs, but the closest I got was restricting agents to be read-only outside of CWD which, after a few turns, bullies them into working in $TMP. Not the same though.

Comment by kstenerud 1 day ago

I took a more paranoid approach to sandboxing agents. They can do whatever they want inside their container, and then I choose which of their changes to apply outside as commits:

    ┌─ YOLO shell ──────────────────────┬─ Outer shell ─────────────────────┐
    │                                   │                                   │
    │ yoloai new myproject . -a         │                                   │
    │                                   │                                   │
    │ # Tell the agent what to do,      │                                   │
    │ # have it commit when done.       │                                   │
    │                                   │ yoloai diff myproject             │
    │                                   │ yoloai apply myproject            │
    │                                   │ # Review and accept the commits.  │
    │                                   │                                   │
    │ # ... next task, next commit ...  │                                   │
    │                                   │ yoloai apply myproject            │
    │                                   │                                   │
    │                                   │ # When you have a good set of     │
    │                                   │ # commits, push:                  │
    │                                   │ git push                          │
    │                                   │                                   │
    │                                   │ # Done? Tear it down:             │
    │                                   │ yoloai destroy myproject          │
    └───────────────────────────────────┴───────────────────────────────────┘

Works with Docker, Seatbelt, and Tart backends (I've even had it build an iOS app inside a seatbelt container).

https://github.com/kstenerud/yoloai

Comment by dbmikus 1 day ago

I've been working on an OSS project, Amika[1], to quickly spin up local or remote sandboxes for coding workloads. We support copy-on-write semantics locally (well, "copy-and-then-write" for now... we just copy directories to a temp file-tree).

It's tailored to play nicely with Git: spin up sandboxes form CLI, expose TCP/UDP ports of apps to check your work, and if running hosted sandboxes, share the sandbox URLs with teammates. I basically want running sandboxed agents to be as easy as `git clone ...`.

Docs are early and edges are rough. This week I'm starting to dogfood all my dev using Amika. Feedback is super appreciated!

FYI: we are also a startup, but local sandbox mgmt will stay OSS.

[1]: https://github.com/gofixpoint/amika

Comment by xyzzy_plugh 1 day ago

This is just a thin wrapper over Docker. It still doesn't offer what I want. I can't run macOS apps, and if I'm doing any sort of compilation, now I need a cross-compile toolchain (and need to target two platforms??).

Just use Docker, or a VM.

The other issue is that this does not facilitate unpredictable file access -- I have to mount everything up front. Sometimes you don't know what you need. And even then copying in and out is very different from a true overlay.

Comment by dbmikus 1 day ago

Appreciate the deets!

It sounds like a big part of your use case is to safely give an agent control of your computer? Like, for things besides codegen?

We're probably not going to directly support that type of use case, since we're focused on code-gen agents and migrating their work between localhost and the cloud.

We are going to add dynamic filesystem mounting, for after sandbox creation. Haven't figured out the exact implementation yet. Might be a FUSE layer we build ourselves. Mutagen is pretty interesting as well here.

Comment by divmain 1 day ago

This is what I was going for with Treebeard[0]. It is sandbox-exec, worktrees, and COW/overlay filesystem. The overlay filesystem is nice, in that you have access to git-ignored files in the original directory without having to worry about those files being modified in the original (due to the COW semantics). Though, truthfully, I haven’t found myself using it much since getting it all working.

[0] https://github.com/divmain/treebeard

Comment by xyzzy_plugh 1 day ago

This approach is too complex for what is provided. You're better off just making a copy of the tree and simply using sandbox-exec. macFUSE is a shitshow.

The main issue I want to solve is unexpected writes to arbitrary paths should be allowed but ultimately discarded. macOS simply doesn't offer a way to namespace the filesystem in that way.

Comment by divmain 1 day ago

Completely agree; my approach was not the most practical. I mostly wanted to know how hard it would be and, as I said, haven’t used it much since. Yes, macFUSE is messy to rely upon. I feel as though the right abstraction is simply unavailable on macOS. Something akin to chroot jails — I don’t feel like I need a particularly hardened sandbox for agentic coding. I just need something that will prevent the stupid mistakes that are particularly damaging.

Comment by tuananh 1 day ago

isn't sandbox-exec already deprecated?

Comment by e1g 1 day ago

Yes, for about a decade. But it’s available everywhere, and still works - and protects us - like brand new!

Comment by rvz 1 day ago

It's quite naive to assume that. There is a reason why it is deprecated by Apple.

Apple is likely preparing to remove it for a secure alternative and all it takes is someone to find a single or a bunch of multiple vulnerabilities in sandbox-exec to give a wake up call to everyone why were they using it in the first place.

I predict that there is a CVE lurking in sandbox-exec waiting to be discovered.

Comment by TheTon 1 day ago

On the other hand, the underlying functionality for sandboxing is used heavily throughout the OS, both for App Sandboxes and for Apple’s own system processes. My guess is sandbox-exec is deprecated more because it never was adequately documented rather than because it’s flawed in some way.

Comment by rvz 1 day ago

> the underlying functionality for sandboxing is used heavily throughout the OS, both for App Sandboxes and for Apple’s own system processes.

The security researchers will leverage every part of the OS stack to bypass the sandbox in XNU which they have done multiple times.

Now, there is a good reason for them to break the sandbox thanks to the hype of 'agents'. It could even take a single file to break it. [0]

> My guess is sandbox-exec is deprecated more because it never was adequately documented rather than because it’s flawed in some way.

You do not know that. I am saying that it has been bypassed before and having it being used all over the OS doesn't mean anything. It actually makes it worse.

[0] https://the-sequence.com/crashone-cve-2025-24277-macos-sandb...

Comment by TheTon 1 day ago

You could apply this same reasoning to any feature or technology. Yes there could be a zero day nobody knows about. We could say that about ssh or WebKit or Chrome too.

I hear what you're saying about the deprecation status, but as I and others mentioned, the fact that the underlying functionality is heavily used throughout the OS by non deprecated features puts it on more solid footing than a technology that's an island unto itself.

Comment by JimDabell 1 day ago

As I understand it, Chrome, Claude Code, and OpenAI Codex all use sandbox-exec. I’m not sure Apple could remove it even if they were sufficiently motivated to.

Comment by rvz 1 day ago

> As I understand it, Chrome, Claude Code, and OpenAI Codex all use sandbox-exec.

Apple can still decide to change it for any reason, regardless of who uses it, since it is undocumented for their use anyway.

> I’m not sure Apple could remove it even if they were sufficiently motivated to.

It can take multiple security issues for them to remove it.

Comment by TheTaytay 1 day ago

Is there a better alternative on Mac?

Comment by ptak_dev 1 day ago

The thing I keep coming back to with local agent sandboxing is that the threat model is actually two separate problems that get conflated.

Problem 1: the agent does something destructive by accident — rm -rf, hard git revert, writes to the wrong config. Filesystem sandboxing solves this well.

Problem 2: the agent does something destructive because it was prompt-injected via a file it read. Sandboxing doesn't help here — the agent already has your credentials in memory before it reads the malicious file.

The only real answer to problem 2 is either never give the agent credentials that can do real damage, or have a separate process auditing tool calls before they execute. Neither is fully solved yet.

Agent Safehouse is a clean solution to problem 1. That's genuinely useful and worth having even if problem 2 remains open.

Comment by wilkystyle 1 day ago

Matchlock[0] is probably the best solution I've come across so far WRT problem 1 and 2:

> Matchlock is a CLI tool for running AI agents in ephemeral microVMs - with network allowlisting, secret injection via MITM proxy, and VM-level isolation. Your secrets never enter the VM.

In a nutshell, it solves problem #2 through a combination of a network allowlist and secret masking/injection on a per-host basis. Secrets are never actually exposed inside the sandbox. A placeholder string is used inside the sandbox, and the mitm proxy layer replaces the placeholder string with the actual secret key outside of the sandbox before sending the request along to its original destination.

Furthermore, because secrets are available to the sandbox only on a per-host basis, you can specify that you want to share OPENAI_API_KEY only with api.openai.com, and that is the only host for which the placeholder string will be replaced with the actual secret value.

edit to actually add the link

[0] https://github.com/jingkaihe/matchlock

Comment by tcbrah 1 day ago

problem 2 is actually scarier than most people realize because it compounds. your agent reads a README in some dependency, that README has injection instructions, now the agent is acting on behalf of the attacker with whatever permissions you gave it. filesystem sandboxing doesnt help because the dangerous action might be "write a backdoor into the file i already have write access to" which is completely within the sandbox rules.

the short-lived scoped credentials approach someone mentioned upthread is probably the best practical mitigation right now. but even that breaks down when the agent legitimately needs broad access to do its job - like if its refactoring across a monorepo it kinda needs write access to everything.

i think the actual answer long term is something closer to capability-based security where each tool call gets its own token scoped to exactly what that specific action needs. but nobody has built that yet in a way that doesnt make the agent 10x slower.

Comment by eelke 1 day ago

Problem 2 is mitigated by only allowing trusted sources through firewall rules.

Comment by brap 1 day ago

I think these are 2 independent axis:

1. Destructive by accident 2. Destructive because it was prompt-injected

And

1. Fucks up filesystem 2. Fucks up external systems via credentials

Comment by pash 1 day ago

Sandvault [0] (whose author is around here somewhere), is another approach that combines sandbox-exe with the grand daddy of system sandboxes, the Unix user system.

Basically, give an agent its own unprivileged user account (interacting with it via sudo, SSH, and shared directories), then add sandbox-exe on top for finer-grained control of access to system resources.

0. https://github.com/webcoyote/sandvault

Comment by mikemcquaid 1 day ago

Yeh I came here to post this. I preferred this approach as user permissions are a bit easier to consistently verify as a second layer of defence.

I also found the author to be helpful and responsive and the tool to be nicely minimalistic rather than the usual vibe coded ever expanding mess.

‘brew install sandvault’ and running ‘sv’ should get you going.

(full disclosure: I created the Homebrew formula and submitted a few PRs to the project)

Comment by TheTaytay 1 day ago

Means a lot coming from you - thanks for taking the time to post, and for taking the time to make the Homebrew formula. (I am also a fan of the author's (webcoyote's) other work.)

Comment by mkagenius 1 day ago

A way to run claude code inside a apple container -

  $ container system start

  $ container run -d --name myubuntu ubuntu:latest sleep infinity

  $ container exec myubuntu bash -c "apt-get update -qq && apt-get install -y openssh-server"

  $ container exec myubuntu bash -c "
    apt-get install -y curl &&
    curl -fsSL https://deb.nodesource.com/setup_lts.x |
  bash - &&
    apt-get install -y nodejs
  "

  $ container exec myubuntu npm install -g @anthropic-ai/claude-code

  $ container exec myubuntu claude --version

Comment by emmelaich 1 day ago

Thanks, hadn't heard of this! In homebrew, too.

https://github.com/apple/container

Comment by terhechte 1 day ago

Shuru should do exactly what you want:

https://shuru.run

Comment by sunnybeetroot 1 day ago

Lume is also a nice wrapper around it

Comment by varenc 1 day ago

fun fact about `sandbox-exec`, the macOS util this relies on: Apple officially deprecated it in macOS Sierra back in 2016!

Its manpage has been saying it's deprecated for a decade now, yet we're continuing to find great uses for it. And the 'App Sandbox' replacement doesn't work at all for use cases like this where end users define their own sandbox rules. Hope Apple sees this usage and stops any plans to actually deprecate sandbox-exec. I recall a bunch of macOS internal services also rely on it.

Comment by jasomill 1 day ago

Aside from named profiles, I'm not sure it wasn't born deprecated.

In particular, has the profile language ever been documented by anything other than the examples used by the OS and third parties reverse engineering it?

Comment by varenc 5 hours ago

Good point. I have no idea when it was introduced. Would be funny if it's been deprecated since its existence.

Comment by davidcann 1 day ago

I made a native macOS app with a GUI for sandbox-exec, plus a network sandbox with per-domain filtering and secrets detection: https://multitui.com/

Comment by alpb 1 day ago

As I understand it, the problem nowadays doesn't seem to be so much that the agent is going to rm -rf / my host, it's more like it's going to connect to a production system that I'm authorized to on my machine or a database tool, and then it's going to run a potentially destructive command. There is a ton of value of running agents against production systems to troubleshoot things, but there are not enough guardrails to prevent destructive actions from the get-go. The solution seems to be specific to each system, and filesystem is just one aspect out of many.

Comment by crossroadsguy 1 day ago

As I understand it, the problem is these apps/agents can do all of these and lot more (if not absolutely everything, while I am sure it can go quite close to doing that).

Solution could be two parts:

OS bringing better and easier to use OS limitations (more granular permissions; install time options and defaults which will be visible to user right there and user can reject that with choices like:

- “ask later”

- “no”

- “fuck no”

with eli5 level GUIs (and well documented). Hell, a lot of these are already solved for mobile OS. While not taking away tools away from hands of the user who wants to go inside and open things up (with clear intention and effort; without having to notarise some shit or pay someone).

2. Then apps[1] having to, forced to, adhere to use those or never getting installed.

[1] So no treating of agents as some “other” kinds of apps. Just limit it for every app (unless user explicitly decides to open things up).

It will also be a great time to nuke the despicable mess like Electron Helpers and shit and app devs considering it completely fine to install a trillion other “things” when user installed just one app without explaining it in the beginning (and hence forced to keep their apps’ tentacles simple and limited)

Comment by tl2do 1 day ago

Intriguing, but...

Around last summer (July–August 2025), I desperately needed a sandbox like this. I had multiple disasters with Claude Code and other early AI models. The worst was when Claude Code did a hard git revert to restore a single file, which wiped out ~1000 lines of development work across multiple files.

But now, as of March 2026, at least in my experience, agents have become more reliable. With proper guardrails in claude.md and built-in safety measures, I haven't had a major incident in about 3 months.

That said, layering multiple safeguards is always recommended—your software assets are your assets. I'd still recommend using something like this. But things are changing, bit by bit.

Comment by e1g 1 day ago

No doubt they are getting better, but even a 0.1% chance of “rm -rf” makes it a question of “when” not “if”. And we sure spin that roulette a lot these days. Safehouse makes that 0%, which is categorically different.

Also, I don’t want it to be even theoretically possible for some file in node_modules to inject instructions to send my dotfiles to China.

Comment by jeremyjh 1 day ago

Prompt injection attacks are very much a thing. It doesn't matter how good the agent is, its vulnerable, and you don't know what you don't know.

Comment by ramoz 1 day ago

Where are we at with SOTA or reliable prompt injection detection mechanisms?

Comment by bilalq 1 day ago

Look into git reflog. If the changes were committed, it was almost certainly possible to still restore them, even if the commit is no longer in your branch.

Comment by ZYbCRq22HbJ2y7 1 day ago

There are probably other tools like this that keep version history based on filesystem events, independent from the project's git repository

https://www.jetbrains.com/help/idea/local-history.html

Comment by synparb 1 day ago

I’ve been playing around with https://nono.sh/ , which adds a proxy to the sandbox piece to keep credentials out of the agent’s scope. It’s a little worrisome that everyone is playing catch up on this front and many of the builtin solutions aren’t good.

Comment by webpolis 23 hours ago

The macOS sandbox approach is clever, but there's an interesting philosophical split here: sandboxing constrains a local agent, whereas running agents in ephemeral cloud desktops removes the local risk surface entirely.

We built Cyqle (https://cyqle.in) partly around this idea — each session is a full Linux desktop that's cryptographically wiped on close (AES-256 key destroyed, data unrecoverable). Agents can do whatever they want inside, and the blast radius is zero by design. No residual state, no host OS exposure.

The tradeoff is latency and connectivity requirements. For teams already doing cloud-based dev work, it's a natural fit. For local-first workflows where you need offline capability or sub-50ms responsiveness, something like Agent Safehouse makes more sense.

Both approaches are worth having — the threat model differs depending on whether you're more worried about data exfiltration or local system compromise.

Comment by agent5ravi 1 day ago

Sandboxing is half the story. The other half is external blast radius: if your local agent can email/DM/pay using your personal accounts, the sandbox doesn't help much. What I want is a separate, revocable identity context per agent or per task: its own inbox/phone for verification, scoped credentials with expiry, and an audit log that survives delegation to sub-agents. We ran into this building Ravi: giving an agent a phone number is easy; keeping delegation traceable to the right principal is the hard bit.

Comment by SiteMgrAI 1 day ago

Sandboxing is going to be table stakes for any serious deployment of AI agents in regulated industries. In sectors like construction, healthcare, or finance, you cannot have an agent with unrestricted filesystem or network access making decisions that affect safety-critical documentation. The macOS sandbox approach is smart because it leverages the OS-level enforcement rather than relying on application-layer restrictions that an agent could potentially reason its way around. The real question is how you balance useful tool access with meaningful containment when the whole point of agents is autonomous action.

Comment by garganzol 2 days ago

While we have `sandbox-exec` in macOS, we still don't have a proper Docker for macOS. Instead, the current Docker runs on macOS as a Linux VM which is useful but only as a Linux machine goes.

Having real macOS Docker would solve the problem this project solves, and 1001 other problems.

Comment by mkagenius 1 day ago

Apple containers were released a few months back. Been using it to sandbox claude/gemini-cli generated code[1].

You can use it to completely sandbox claude code too.

1. Coderunner - https://github.com/instavm/coderunner

Comment by arianvanp 1 day ago

That is also Linux VM on MacOS. They're not MacOS containers.. So it's completely pointless / useless for MacOS or iOS development

Comment by mkagenius 1 day ago

Oh, yes. I thought GP was mostly worried about shared VM problem.

Comment by dpe82 1 day ago

Nitpick, which probably doesn't matter too much in this context but is always good to remember: Docker containers are not security boundaries.

Comment by PlasmaPower 1 day ago

Why not? They're definitely not perfect security boundaries, but neither are VMs. I think containers provide a reasonable security/usability tradeoff for a lot of use cases including agents. The primary concern is kernel vulnerabilities, but if you're keeping your kernel up-to-date it's still imo a good security layer. I definitely wouldn't intentionally run malware in it, but it requires an exploit in software with a lot of eyes on it to break out of.

Comment by dpe82 1 day ago

It's certainly better than nothing. Hence "probably doesn't matter too much in this context" - but of course it always matters what your threat model is. Your own agents under your control with aligned models and not interacting with attacker data? Should be fine.

But too many people just automatically equate docker with strong secure isolation and... well, it can be, sometimes, depending a hundred other variables. Thus the reminder; to foster conversations like this.

Comment by fredoliveira 1 day ago

counter-intuitively, the fact that docker on the mac requires a linux-based VM makes it safer than it otherwise would be. But your point stands in general, of course.

Comment by egorfine 1 day ago

> Having real macOS Docker would solve the problem

I'm very slowly working on a mock docker implementation for macOS that uses ephemeral VM to launch a true guest macOS and perform commands as per Dockerfile/copies files/etc. I use it internally for builds. No public repo yet though. Not sure if there is demand.

Comment by hrmtst93837 1 day ago

If you expect macOS to behave like Linux, you are asking the wrong OS to do the job. Docker and runtimes like runc depend on Linux kernel primitives such as namespaces and cgroups that XNU does not provide, and macOS adds System Integrity Protection, TCC, signed system frameworks, and launchd behaviors that make sharing the host kernel for arbitrary workloads technically hard and legally messy.

A practical path is ephemeral macOS VMs using Apple's Virtualization.framework coupled with APFS copy-on-write clones for fast provisioning, or limited per-process isolation via seatbelt and the hardened runtime, which respects Apple's licensing that restricts macOS VMs to Apple hardware and gives strong isolation at the cost of higher RAM and storage overhead compared with Linux containers.

Comment by PufPufPuf 1 day ago

What would native containers bring over Linux ones? The performance of VZ emulation is good, existing tools have great UX, and using a virtualized kernel is a bit safer anyways. I regularly use a Lima VM as a VSCode remote workspace to run yolo agents in.

Comment by garganzol 1 day ago

Sometimes you just have to run native software. In my case, that means macOS build agents using Xcode and Apple toolchains which are only available on macOS.

It's not a pleasure to run them in a mutable environment where everything has a floating state as I do now. Native Docker for macOS would totally solve that.

Comment by qalmakka 1 day ago

> What would native containers bring over Linux ones?

What would a Phillips screwdriver bring over a flathead screwdriver? Sometimes you don't want/need the flathead screwdriver, simple as that. There are macOS-specific jobs you need to run in macOS, such as xcode toolchains etc. You can try cross compiling, but it's a pain and ridiculous given that 100% of every other OS supports containers natively (including windows). It's clear to me that Apple is trying to make the ratio jobs/#MacMinis as small as possible

Comment by hirvi74 1 day ago

VZ has been exceptional for me. I have been running headless VMs with Lima and VZ for a while now with absolutely zero problems. I just mount a directory I want Claude Code to be able to see and nothing more.

Comment by hkonte 1 day ago

Sandboxing local agents is the right instinct — the blast radius of an unconstrained agent on a dev machine is real.

One thing I'd add: sandboxing the execution environment only solves half the problem. The other half is the prompt itself — if the agent's instructions are ambiguous or poorly scoped, sandboxing just contains the damage from a confused agent rather than preventing it.

I built flompt (https://flompt.dev) to address the instruction side — a visual prompt builder that decomposes agent prompts into 12 semantic blocks (role, constraints, objective, output format, etc.) and compiles them to Claude-optimized XML. Tight instructions + sandboxed execution = actually safe agents.

https://github.com/Nyrok/flompt

Comment by hsaliak 1 day ago

This is a very nice and clean implementation. Related to this - I've been exploring injecting landlock and seccomp profiles directly into the elf binary, so that applications that are backed by some LLM, but want to 'do the right thing' can lock themselves out. This ships a custom process loader (that reads the .sandbox section) and applies the policies, not unlike bubblewrap which uses namespaces). The loading can be pushed to a kernel module in the future.

https://github.com/hsaliak/sacre_bleu very rough around the edges, but it works. In the past there were apps that either behaved well, or had malicious intent, but with these LLM backed apps, you are going to see apps that want to behave well, but cannot guarantee it. We are going to see a lot of experimentation in this space until the UX settles!

Comment by ClaudioAnthrop 1 day ago

[dead]

Comment by carderne 1 day ago

How do agents tend to deal with getting blocked? Messing around with sandboxes, I've quite even seen them get blocked, assume something is wrong, and go _crazy_ trying to get around the block, never stopping to ask for user input. It might be good to add to the error message: "This is deliberate, don't try to get around it."

For those using pi, I've built something similar[1] that works on macOS+Linux, using sandbox-exec/bubblewrap. Only benefit over OP is that there's some UX for temporarilily/permanently bypassing blocks.

[1] https://github.com/carderne/pi-sandbox

Comment by e1g 1 day ago

Claude Code and Codex quickly figure out they are inside sandbox-exec environment. Maybe because they know it internally. Other agents often realize they are being blocked, and I haven't seen them go haywire yet.

Big love for Pi - it was the first integration I added to Safehouse. I wanted something that offers strong guarantees across all agents (I test and write them nonstop), has no dependencies (e.g., the Node runtime), and is easy to customize, so I didn't use the Anthropic sandbox-runtime.

Comment by carderne 1 day ago

Interesting, that's not been my experience! Maybe you've got the list of things to allow/block just right. While testing different policies I've frequently seen Opus 4.6 go absolutely nuts trying to get past a block, unless I made it more clear what was happening.

Yeah I think for general use the transparency of what your thing does is really great compared to a pile of TypeScript and whatnot.

Comment by gbrindisi 1 day ago

ah I also did my own sandbox and at least twice the agent inside tried really hard to go around the firewall, so I ended up intercepting calls to `connect` to return a message that says "Connection refused by the sandbox, don't try to bypass".

Code here: https://github.com/gbrindisi/agentbox

Comment by w10-1 1 day ago

But... why not just run macOS in a VM?

If/since AI agents work continuously, it seems like running macOS in a VM (via the virtualization framework directly) is the most secure solution and requires a lot less verification than any sandboxing script. (Critical feature: no access to my keychain.)

AI agents are not at all like container deploys which come and go with sub-second speed, and need to be small enough that you can run many at a time. (If you're running local inference, that's the primary resource hog.)

I'm not too worried about multiple agents in the same vm stepping on each other. I give them different work-trees or directory trees; if they step over 1% of the time, it's not a risk to the bare-metal system.

Not sure if I'm missing something...

Comment by sunnybeetroot 1 day ago

1 limitation is Apple Virtualisation does not offer USB passthrough for connecting to iPhones for iOS development.

Comment by llimllib 1 day ago

For me, it's file system latency on mac os when virtualizing that kills me. Cargo, npm, pip, etc create many small files and there's a high per-file latency on the FS layer

Comment by devrimozcay 1 day ago

Interesting direction.

One thing we've been seeing with production AI agents is that the real risk isn't just filesystem access, but the chain of actions agents can take once they have tool access.

Even a simple log-reading capability can escalate if the agent starts triggering automated workflows or calling internal APIs.

We've been experimenting with incident-aware agents that detect abnormal behavior and automatically generate incident reports with suggested fixes.

Curious if you're thinking about integrating behavioral monitoring or anomaly detection on top of the sandbox layer.

Comment by matifali 1 day ago

I wonder why you believe that running agents locally is the best approach. For most people, having agents operate remotely is more effective because the agent can stay active without your local machine needing to remain powered on and connected to the internet 24/7.

Comment by NegativeLatency 1 day ago

It’s nice having control and ownership of your software.

I’m assuming it’s similar to why people run plex, web servers, file sharing, etc

Also personally I’d rather not pay monthly fees for stuff if it can be avoided.

Comment by paxys 1 day ago

These agents are all calling APIs that are well beyond your control. How does it matter whether a thin CLI wrapper is running on your computer or not?

Comment by mikodin 1 day ago

Piggybacking on this - I think it well equips us for a future when local models are stronger. I for one am grateful for efforts like these

Comment by deevus 1 day ago

For this specific problem I built pixels: https://github.com/deevus/pixels

It supports running on a TrueNAS SCALE server, or via Incus (local or remote). I'm still working on tightening the security posture, but for many types of AI workflows it will be more than sufficient.

Comment by sunnybeetroot 1 day ago

It’s nice to debug Apple platform projects immediately

Comment by Tadbitrusty 1 day ago

One thing I kept hitting when running agents in sandboxed environments — they lose access to reliable system time too. datetime.now() returns whatever the container thinks, which drifts. Built a small external endpoint for this (SpyderGoat) after an agent made decisions based on completely wrong temporal context. Sandboxing the environment is step one; giving the agent reliable ground truth: for things like time is step two.

Comment by abhisek 1 day ago

I think this is the right approach to building sandbox for agents ie. over existing OS native sandbox capabilities so that they are truly enforced.

However the challenge is, sandbox profiles (rules) are always workload specific. How do you define “least privilege” for a workload and then enforce it through the sandbox.

Which is why general sandboxes wont be useful or even feasible. The value is observing and probably auto-generating baseline policy for a given workload.

Wrong or overly relaxed policies would make sandbox ineffective against real threats it is expected to protect against.

Comment by srid 1 day ago

If you are using Nix, there's also https://github.com/srid/sandnix that works on Linux (landrun) and macOS (sandbox-exec).

Comment by sunir 1 day ago

Is clunker some new slang that's different than clanker? I'm asking for a friend of my friend Roku.

p.s. thanks for making this; timely as I am playing whackamole with sandboxing right now.

Comment by e1g 1 day ago

Testing in prod! Thank you, just fixed that typo.

Comment by brutuscat 1 day ago

What do you think of sandbox-exec being marked as deprecated?

https://news.ycombinator.com/item?id=31973232

https://github.com/openai/codex/issues/215

Comment by mlysk 1 day ago

Looks like this could be the perfect playground fence around my sandbox in legit-code

https://news.ycombinator.com/item?id=46692885

Comment by inoki 1 day ago

I'm also working on a cross-platform solution (sandbox-exec on macOS). What if Apple finally drops this after long deprecation?

Comment by e1g 1 day ago

Let’s make something so popular and useful that they can’t drop it.

Comment by Finbarr 1 day ago

Awesome to see a bash-only method of solving this problem. Also like that it alerts on attempts to read restricted stuff.

I built yolobox to solve this using docker/apple containers: https://github.com/finbarr/yolobox

Comment by jeff_antseed 1 day ago

the macOS-only constraint is the biggest blocker for us. most of our agents run on linux VMs and there's basically nothing equivalent -- you end up choosing between full docker isolation (heavy) or just... not sandboxing at all and hoping.

been watching microsandbox but its pretty early. landlock is the linux kernel primitive that could theoretically enable something like this but nobody's built the nice policy layer on top yet.

curious if anyone has a good solution for the "agent running on a remote linux server" case. the threat model is a bit different anyway (no iMessage/keychain to protect) but filesystem and network containment still matter a lot

Comment by carderne 1 day ago

There is sandbox-runtime [1] from Anthropic that uses bubblewrap to sandbox on Linux (and works the same as OP on macOS). You can look at the code to see how it uses it. Anthropic's tool only support read blacklist, not a whitelist, so I forked it yesterday to support that [2].

[1] https://github.com/anthropic-experimental/sandbox-runtime [2] https://github.com/carderne/sandbox-runtime

Comment by edf13 1 day ago

We are a different approach and are targeting Linux for our first release (Windows & Mac shortly afterwards).

Taking more of an automated supervisor approach with limited manual approval for edge cases.

Grith.ai

Comment by guimbuilds 1 day ago

Interesting, we're tackling a different layer of the same problem, snapshot before every run + one-click rollback instead of kernel sandboxing. Complementary approaches. Nice work.

Comment by kxrm 1 day ago

This is amazing, thanks for sharing this.

I use clippy with rust and the only thing I had to add was:

  (subpath "/Library/Developer/CommandLineTools")

Comment by andai 1 day ago

I was obstinate and refused to learn docker, so I realized I can just rent a $3 VPS. If it blows up the VPS I reset it!

Then I realized the only thing I care about on my local machine is "don't touch my files", and Unix users solved that in 1970. So I just run agents as "agent" user.

I think running it on a separate machine is nicer though, because it's even simpler and safer than that. (My solution still requires careful setup and regular overhead when you get permission issues. "It's on another laptop, and my stuff isn't" has neither of those problems.)

Comment by rwky 1 day ago

Fore Linux firejail works well https://firejail.wordpress.com/

Comment by kevincloudsec 18 hours ago

cool project but prompt injection doesn't care about your filesystem permissions. the malicious instruction comes from a file the agent is allowed to read.

Comment by datapolitical 1 day ago

This really is not going to be safe on something like Mac or Windows until it’s built into the OS.

But given how fast agents are moving, I would be shocked if such tools were not already being built

Comment by ashishb 1 day ago

I built something similar for myself that works on both Linux and Mac OS

https://github.com/ashishb/amazing-sandbox

Comment by cuber_messenger 1 day ago

It's the exact auth control I want. However, it seems it's not a safehouse for local agents, but a safe cage, IMHO. After all, it prevents damage they might cause.

Comment by ashniu123 1 day ago

How's this different from https://container-use.com?

Comment by rishabhaiover 1 day ago

How do you get local sandboxing with a permission based model? I thought wasmtime was the answer!

Comment by gozucito 2 days ago

so this works the same as Claude Code /sandbox? The innovation being that it's harness-agnostic?

Comment by e1g 1 day ago

Roughly, yes, but more reliable (and restrictive), as Claude Code has ways to escape its sandbox. This gives more protection and guards across all CLI agnets (Amp, Pi, etc)

Comment by arianvanp 1 day ago

That and that the built in sandbox in Claude Code is bad (read only access to everything by default) and tightly coupled (cant modify it or swap it out).

Comment by grun 1 day ago

Comment by dbmikus 1 day ago

I like that it's all bash.

How does this compare with Codex's and Claude's built-in sandboxing?

Comment by e1g 1 day ago

Claude: can escape its sandbox (there are GitHub issues about this) and, when sandboxed, still has full read access to everything on your machine (SSH keys, API keys, files, etc.)

Codex: IIRC, only shell commands are sandboxed; the actual agent runtime is not.

Comment by dbmikus 1 day ago

Cool, thanks for explaining!

Comment by wek 1 day ago

Do you have plans to go cross-platform and offer a solution for Windows?

Comment by boxedemp 1 day ago

Fantastic! I had been using dockers but this might be better!

Comment by snthpy 7 hours ago

cco I another option in this space.

https://github.com/nikvdp/cco

Comment by treexs 1 day ago

wow it's interesting how noticeable sites built with claude maybe with the frotnend-design skill are now

Comment by e1g 1 day ago

IYKYK, it’s the new Bootstrap!

The alternative would be “no site”, which is still somehow worse.

Comment by sagarpatil 1 day ago

Looks good. I’ll give it a try.

Comment by vivid242 1 day ago

Nice! I‘d be interesting in the things that went wrong during development. Which loopholes were discovered last, if any?

Comment by cowpig 1 day ago

This is awesome! I think this is one of the most important technical hurdles in deploying agent applications right now.

I'm involved with a project building something very similar, which we literally open sourced an alpha version of last week:

https://github.com/GreyhavenHQ/greywall

It's a bit different in that:

- We started with Linux

- It is a binary that wraps the agent runtime

- It runs alongside a proxy which captures all traffic to provide a visibility layer

- Rules can be changed dynamically at runtime

I am so happy this problem is getting the attention it deserves!

Comment by m3kw9 1 day ago

How’s this different from macOS seatbelt

Comment by croes 1 day ago

The real threat is the agents access to your accounts and services.

Why always the fixation on the hardware?

Comment by cjbarber 1 day ago

See also various sandbox tools I and others (e.g. jpeeler) have collected: https://news.ycombinator.com/item?id=47102258

Comment by ai_fry_ur_brain 1 day ago

Docker...

Comment by nemo44x 1 day ago

Supervisor agent frameworks are going to be a big industry soon. You simply can’t have agents executing commands without a trusted supervisory layer examining and certifying actions.

All the issues we get from AI today (hallucinations, goal shift, context decay, etc) get amplified unbelievably fast once you begin scaling agents out due to cascading. The risk being you go to bed and when you wake up your entire infrastructure is gone lol.

Comment by BLACKCRAB 49 minutes ago

[dead]

Comment by 10 hours ago

Comment by CloakHQ 1 day ago

[dead]

Comment by oliver_dr 1 day ago

[dead]

Comment by babbagegao 1 day ago

[dead]

Comment by octoclaw 1 day ago

[dead]

Comment by bhekanik 1 day ago

[dead]

Comment by babbagegao 1 day ago

[dead]

Comment by yowang 1 day ago

[dead]

Comment by rex_claw 1 day ago

[dead]

Comment by babbagegao 1 day ago

[dead]

Comment by maciver 1 day ago

[dead]

Comment by naomi_kynes 1 day ago

The "full-auto" framing is interesting. What happens when the agent hits something it can't resolve autonomously? Even sandboxed, there's a point where the agent needs to ask a question or get approval.

Most setups handle this awkwardly: fire a webhook, write to a log, hope the human is watching. The sandbox keeps the agent contained, but doesn't give it a clean "pause and ask" primitive. The agent either guesses (risky) or silently fails (frustrating).

Seems like there are two layers: the security boundary (sandbox-exec, containers, etc.) and the communication boundary (how does a contained agent reach the human?). This project nails the first. The second is still awkward for most setups.

Comment by niyikiza 1 day ago

The two-layer framing is right. Sandbox-exec contains local blast radius, and that's important. But if the agent already has a credential in memory, sandboxing the filesystem doesn't help. I've been working on a primitive for scoped authorization at the tool call level: what was this agent allowed to do, for which task, signed by whom. The core is open-sourced: https://github.com/tenuo-ai/tenuo

Comment by e1g 1 day ago

Correct, this is for skipping permissions (safely), but does nothing for skipping questions.

Comment by 1 day ago

Comment by jamiemallers 1 day ago

[dead]

Comment by 10keane 1 day ago

[dead]

Comment by moehj 1 day ago

[dead]

Comment by aplomb1026 1 day ago

[dead]

Comment by Agent_Builder 1 day ago

[dead]

Comment by openclaw01 1 day ago

[dead]

Comment by devonkelley 1 day ago

[flagged]

Comment by poopiokaka 1 day ago

[dead]

Comment by bschmidt97979 1 day ago

[dead]

Comment by gnanagurusrgs 1 day ago

This is the right problem to solve. At Arcade, we see the same gap — agents get shell access, API keys, and network by default. The permissions model is backwards.

sandbox-profiles is a solid primitive for local agents. The missing piece in production is the tool layer — even a sandboxed agent can still make dangerous API calls if the MCP tools it has access to aren't individually authed and scoped.

The real stack is: sandbox the runtime (what Agent Safehouse does) + scope the tools (what we do with JIT OAuth at the MCP layer). Neither alone is enough.

Nice work shipping this.

https://www.arcade.dev/blog/ai-agent-auth-challenges-develop...