Agent Safehouse – macOS-native sandboxing for local agents
Posted by atombender 2 days ago
Comments
Comment by e1g 1 day ago
1. I built this because I like my agents to be local. Not in a container, not in a remote server, but running on my finely-tuned machine. This helps me run all agents on full-auto, in peace.
2. Yes, it's just a policy-generator for sandbox-exec. IMO, that's the best part about the project - no dependencies, no fancy tech, no virtualization. But I did put in many hours to identify the minimum required permissions for agents to continue working with auto-updates, keychain integration, and pasting images, etc. There are notes about my investigations into what each agent needs https://agent-safehouse.dev/docs/agent-investigations/ (AI-generated)
3. You don't even need the rest of the project and use just the Policy Builder to generate a single sandbox-exec policy you can put into your dotfiles https://agent-safehouse.dev/policy-builder.html
Comment by atombender 1 day ago
I've seen sandbox policy documents for agents before, but this is the first ready-to-use app I've come across.
I've only had a couple of points of friction so far:
- Files like .gitconfig and .gitignore in the home folder aren't accessible, and can't be made accessible without granting read only access to the home folder, I think?
- Process access is limited, so I can't ask Claude to run lldb or pkill or other commands that can help me debug local processes.
More fine-grained control would be really nice.
Comment by e1g 1 day ago
For handling global rules (like ~/.gitconfig and ~/.gitignore), I keep a local policy file that whitelists my "shared globals" paths, and I tell Safehouse to include that policy by default. I just updated the README with an example that might be useful[1]. I also enabled access to ~/.gitignore by default as it's a common enough default.
For process management, there is a blurry line about how much to allow without undermining the sandboxing concept. I just added new integrations[2] to allow more process control and lldb, but I don't know this area well. You can try cloning the repo, asking your agents to tweak the rules in the repo until your use-case works, and send a PR - I'll merge it!
Alternatively, using the "custom policy" feature above, you can selectively grant broad access to your tools (you can use log monitoring to see rejections, and then add more permisions into the policy file)
[1] https://github.com/eugene1g/agent-safehouse?tab=readme-ov-fi...
Comment by atombender 1 day ago
The process control policy, that's kind of niche and should definitely not be something agents are always allowed to do, so having a shorthand flag like you added in that pull request is the right choice.
I'm sure Anthropic and the other major players will catch up and add better sandboxing eventually, but for now, this tool has been exactly what I needed — many thanks!
I also wonder if this could have be a plugin or MCP server? I was using this plugin [1] for a bit, and it appears to use a "PreToolUse" that modifies every tool invocation. The benefit here would be that you could even change the Safehouse settings inside a session, e.g. turn process control on or off.
Comment by indeyets 1 day ago
Comment by atombender 1 day ago
Comment by TheBengaluruGuy 1 day ago
Comment by ai_fry_ur_brain 1 day ago
Comment by bouke 1 day ago
- I asked the agent to change my global git username, Codex asked my permission to execute `git config --global user.name "Botje"` and after I granted permission, it was able to change this global configuration.
- I asked it to list my home directory and it was able to (this time without Codex asking for permission).
Comment by asabla 1 day ago
I've been trying to get microsandbox to play nicely. But this is much closer to what I actually need.
I glimpsed through the site and the script. But couldn't really see any obvious gotchas.
Any you've found so far which hasn't been documented yet?
Comment by e1g 1 day ago
But lately I’ve been using agents to test via browsers, and starting headless browsers from the agent is flakey. I’m working on that but it’s hard to find a secure default to run Chrome.
In the repo, I have policies for running the Claude desktop app and VSCode inside the same sandbox (so can do yolo mode there too), so there is hope for sandboxing headless Chrome as well.
Comment by asabla 1 day ago
Did a migration myself last week from using playwright mcp towards playwright-cli instead. Which has been playing much nicer so far. I guess you would run into the same issues you've already mentioned about running chrome headless in one of these sandboxes.
I'll for sure keep an eye out for updates.
Kudos to the project!
Comment by e1g 1 day ago
Comment by pizlonator 1 day ago
Thanks for making it!
Comment by siwatanejo 1 day ago
Yet the first thing I find in your README is that to install your tool I need to trust some random server serve me an .sh file that I will execute in my computer (not sure if with sudo... but still).
Come on man, give me a tarball :)
EDIT: PS: before someone gives me the typical "but you could have malware in that tarball too!!!", well, it's easier to inspect what's inside the tarball and compare it to the sources of the repo, maybe also take a look at the CI of the repo to see if the tarball is really generated automatically from the contents of the repo ;)
Comment by e1g 1 day ago
Alternatively, you can feed these instructions to your LLM and have it generate you a minimal policy file and a shell wrapper https://agent-safehouse.dev/llm-instructions.txt
Comment by camkego 1 day ago
Anyway, thanks for building Agent Safehouse.
Comment by e1g 1 day ago
Comment by oneplane 1 day ago
I've been trying out similar things to help internal teams to use systems and languages like Rego (for Open Policy Agent) to have a visual and more 'a la carte' experience when starting out, so they don't have to jump straight to learning all syntax and patterns for a language they might have never seen before.
Comment by e1g 1 day ago
Comment by chrisweekly 1 day ago
Comment by dummydummy1234 1 day ago
Comment by Quiark 1 day ago
Comment by aa-jv 1 day ago
How is this any different than running some random .sh script?
The assumption is that package-manager code is reviewed - that same assumption can be applied just as equitably to wget'ed .sh files.
tl;dr - you are reviewing everything you ever run on your system, right?
Comment by quietsegfault 1 day ago
Comment by cortesoft 1 day ago
Comment by quietsegfault 1 day ago
Comment by scosman 1 day ago
Comment by quietsegfault 1 day ago
I prefer containerization because it gives me a repeatable environment that I know works, where on my system things can change as the os updates and applications evolve.
But I can understand the benefit of sandboxing for sure! Thank you.
Comment by scosman 1 day ago
If you prefer containers, use containers.
Comment by sunnybeetroot 1 day ago
Comment by dionian 1 day ago
Comment by paxys 1 day ago
Comment by scosman 1 day ago
The downside is that it requires access to more than it technically needs (Claude keys for example). I’m working on a version where you sandbox the agent’s Bash tool, not the agent itself. https://github.com/Kiln-AI/Kilntainers
Comment by snthpy 8 hours ago
How about using bash-tool to intercept the commands and then passing them onto the containers?
Comment by scosman 6 hours ago
If you're using an agent tool that already includes an existing bash tool which calls host OS, just remove that one and add this.
Comment by snthpy 7 hours ago
Then how about running Claude Code or your harness of choice inside bubblewrap with a shim/stub for the base binary?
Comment by bootlooped 1 day ago
Claude Code seemed to be able to reach outside its own sandbox sometimes, so I lost trust in it. Manually wrapping it in sandbox-exec solved the issue.
Comment by hmokiguess 1 day ago
Comment by zmmmmm 1 day ago
I honestly think that sandboxing is currently THE major challenge that needs to be solved for the tech to fully realise its potential. Yes the early adopters will YOLO it and run agents natively. It won't fly at all longer term or in regulated or more conservative corporate environments, let alone production systems where critical operations or data are in play.
The challenge is that we need a much more sophisticated version of sandboxing than anybody has made before. We can start with network, file system and execute permissions - but we need way more than that. For example, if you really need an agent to use a browser to test your application in a live environment, capture screenshots and debug them - you have to give it all kinds of permissions that go beyond what can be constrained with a traditional sandboxing model. If it has to interact with resources that cost money (say, create cloud resources) then you need an agent aware cloud cost / billing constraint.
Somehow all this needs to be pulled together into an actual cohesive approach that people can work with in a practical way.
Comment by andybak 1 day ago
Have you considered that it's unsolvable? Or - at least - there is an irreconcilable tension between capability and safety. And people will always choose the former if given the choice.
Comment by zmmmmm 1 day ago
The most unsolvable part is prompt injection. For that you need full tracking of the trust level of content the agent is exposed to and a method of linking that to what actions it has accessible to it. I actually think this needs to be fully integrated to the sandboxing solution. Once an agent is "tainted" its sandbox should inherently shrink down to the radius where risk is balanced with value. For example, my fully trusted agent might have a balance of $1000 in my AWS account, while a tainted one might have that reduced to $50.
So another aspect of sanboxing is to make the security model dynamic.
Comment by skybrian 1 day ago
One idea is to have the coding agent write a security policy in plan mode before reading any untrusted files:
Comment by schmuhblaster 1 day ago
[0] https://deepclause.substack.com/p/static-taint-analysis-for-...
Comment by silverstream 1 day ago
Comment by e1g 1 day ago
That's not the case with Agent Safehouse - you can give your agent access to select ~/.dotfiles and env, but by default it gets nothing (outside of CWD)
Comment by ericlevine 1 day ago
[1] https://www.tomshardware.com/tech-industry/artificial-intell...
Comment by zmmmmm 1 day ago
Comment by simonw 1 day ago
This looks like a competent wrapper around sandbox-exec. I've seen a whole lot of similar wrappers emerging over the past few months.
What I really need is help figuring out which ones are trustworthy.
I think this needs to take the form of documentation combined with clearly explained and readable automated tests.
Most sandboxes - including sandbox-exec itself - are massively under-documented.
I am going to trust them I need both detailed documentation and proof that they work as advertised.
Comment by e1g 1 day ago
Your point is totally fair for evaluating security tooling. A few notes -
1. I implemented this in Bash to avoid having an opaque binary in the way.
2. All sandbox-exec profiles are split up into individual files by specific agent/integration, and are easily auditable (https://github.com/eugene1g/agent-safehouse/tree/main/profil...)
3. There are E2E tests validating sandboxing behavior under real agents
4. You don't even need the Safehouse Bash wrapper, and can use the Policy Builder to generate a static policy file with minimal permissions that you can feed to sandbox-exec directly (https://agent-safehouse.dev/policy-builder). Or feed the repo to your LLMs and have them write your own policy from the many examples.
5. This whole repo should be a StrongDM-style readme to copy&paste to your clanker. I might just do that "refactor", but for now added LLM instructions to create your own sandbox-exec profiles https://agent-safehouse.dev/llm-instructions.txt
Comment by big_toast 1 day ago
Would xcodebuild work in this context? Presumably I'd watch a log (or have an agent) and add permissions until it works?
Comment by e1g 1 day ago
Yes, Safehouse should work for xcodebuild workloads in the way you described - try to run it, watch for failures, extend the profile, try again. Your agent can do this in a loop by itself - just feed it the repo as there are many integrations that are not enabled by default that will help it.
Comment by big_toast 1 day ago
I read a little from sandvault and they suggest sandbox-exec doesn't allow recursive sandboxing, so you need to set flags on xcodebuild and swift to not sandbox in addition to the correct SBPL policy.
(I don't think sandvault has a swift/xcode specific policy because they're dumping everything into a sandvault userspace. And it doesn't really concern itself with networking afaict either.)
Comment by e1g 14 hours ago
This also applies to sandboxing an Electron app: Electron has its own built-in sandboxing via sandbox-exec, so if you're wrapping an Electron app in your own sandboxing, you have to disable that inner sandbox (with Electron's --no-sandbox or ELECTRON_DISABLE_SANDBOX=1). In the repo, I have examples for minimal sandbox-exec rules required to run Claude Code[1] and VSCode[2] (so you can do --dangerously-skip-permission in their destop app and VSCode extension)
[1] https://github.com/eugene1g/agent-safehouse/blob/a7377924efa...
[2] https://github.com/eugene1g/agent-safehouse/blob/a7377924efa...
Comment by kstenerud 1 day ago
Comment by okanesen 1 day ago
Comment by vasco 1 day ago
Comment by xyzzy_plugh 2 days ago
I like that it's just a shell script.
I do wish that there was a simple way to sandbox programs with an overlay or copy-on-write semantics (or better yet bind mounts). I don't care if, in the process of doing some work, an LLM agent modifies .bashrc -- I only care if it modifies _my_ .bashrc
Comment by e1g 1 day ago
Re “overlay FS” - I too wish this was possible on Macs, but the closest I got was restricting agents to be read-only outside of CWD which, after a few turns, bullies them into working in $TMP. Not the same though.
Comment by kstenerud 1 day ago
┌─ YOLO shell ──────────────────────┬─ Outer shell ─────────────────────┐
│ │ │
│ yoloai new myproject . -a │ │
│ │ │
│ # Tell the agent what to do, │ │
│ # have it commit when done. │ │
│ │ yoloai diff myproject │
│ │ yoloai apply myproject │
│ │ # Review and accept the commits. │
│ │ │
│ # ... next task, next commit ... │ │
│ │ yoloai apply myproject │
│ │ │
│ │ # When you have a good set of │
│ │ # commits, push: │
│ │ git push │
│ │ │
│ │ # Done? Tear it down: │
│ │ yoloai destroy myproject │
└───────────────────────────────────┴───────────────────────────────────┘
Works with Docker, Seatbelt, and Tart backends (I've even had it build an iOS app inside a seatbelt container).Comment by dbmikus 1 day ago
It's tailored to play nicely with Git: spin up sandboxes form CLI, expose TCP/UDP ports of apps to check your work, and if running hosted sandboxes, share the sandbox URLs with teammates. I basically want running sandboxed agents to be as easy as `git clone ...`.
Docs are early and edges are rough. This week I'm starting to dogfood all my dev using Amika. Feedback is super appreciated!
FYI: we are also a startup, but local sandbox mgmt will stay OSS.
Comment by xyzzy_plugh 1 day ago
Just use Docker, or a VM.
The other issue is that this does not facilitate unpredictable file access -- I have to mount everything up front. Sometimes you don't know what you need. And even then copying in and out is very different from a true overlay.
Comment by dbmikus 1 day ago
It sounds like a big part of your use case is to safely give an agent control of your computer? Like, for things besides codegen?
We're probably not going to directly support that type of use case, since we're focused on code-gen agents and migrating their work between localhost and the cloud.
We are going to add dynamic filesystem mounting, for after sandbox creation. Haven't figured out the exact implementation yet. Might be a FUSE layer we build ourselves. Mutagen is pretty interesting as well here.
Comment by divmain 1 day ago
Comment by xyzzy_plugh 1 day ago
The main issue I want to solve is unexpected writes to arbitrary paths should be allowed but ultimately discarded. macOS simply doesn't offer a way to namespace the filesystem in that way.
Comment by divmain 1 day ago
Comment by tuananh 1 day ago
Comment by e1g 1 day ago
Comment by rvz 1 day ago
Apple is likely preparing to remove it for a secure alternative and all it takes is someone to find a single or a bunch of multiple vulnerabilities in sandbox-exec to give a wake up call to everyone why were they using it in the first place.
I predict that there is a CVE lurking in sandbox-exec waiting to be discovered.
Comment by TheTon 1 day ago
Comment by rvz 1 day ago
The security researchers will leverage every part of the OS stack to bypass the sandbox in XNU which they have done multiple times.
Now, there is a good reason for them to break the sandbox thanks to the hype of 'agents'. It could even take a single file to break it. [0]
> My guess is sandbox-exec is deprecated more because it never was adequately documented rather than because it’s flawed in some way.
You do not know that. I am saying that it has been bypassed before and having it being used all over the OS doesn't mean anything. It actually makes it worse.
[0] https://the-sequence.com/crashone-cve-2025-24277-macos-sandb...
Comment by TheTon 1 day ago
I hear what you're saying about the deprecation status, but as I and others mentioned, the fact that the underlying functionality is heavily used throughout the OS by non deprecated features puts it on more solid footing than a technology that's an island unto itself.
Comment by JimDabell 1 day ago
Comment by rvz 1 day ago
Apple can still decide to change it for any reason, regardless of who uses it, since it is undocumented for their use anyway.
> I’m not sure Apple could remove it even if they were sufficiently motivated to.
It can take multiple security issues for them to remove it.
Comment by TheTaytay 1 day ago
Comment by ptak_dev 1 day ago
Problem 1: the agent does something destructive by accident — rm -rf, hard git revert, writes to the wrong config. Filesystem sandboxing solves this well.
Problem 2: the agent does something destructive because it was prompt-injected via a file it read. Sandboxing doesn't help here — the agent already has your credentials in memory before it reads the malicious file.
The only real answer to problem 2 is either never give the agent credentials that can do real damage, or have a separate process auditing tool calls before they execute. Neither is fully solved yet.
Agent Safehouse is a clean solution to problem 1. That's genuinely useful and worth having even if problem 2 remains open.
Comment by wilkystyle 1 day ago
> Matchlock is a CLI tool for running AI agents in ephemeral microVMs - with network allowlisting, secret injection via MITM proxy, and VM-level isolation. Your secrets never enter the VM.
In a nutshell, it solves problem #2 through a combination of a network allowlist and secret masking/injection on a per-host basis. Secrets are never actually exposed inside the sandbox. A placeholder string is used inside the sandbox, and the mitm proxy layer replaces the placeholder string with the actual secret key outside of the sandbox before sending the request along to its original destination.
Furthermore, because secrets are available to the sandbox only on a per-host basis, you can specify that you want to share OPENAI_API_KEY only with api.openai.com, and that is the only host for which the placeholder string will be replaced with the actual secret value.
edit to actually add the link
Comment by tcbrah 1 day ago
the short-lived scoped credentials approach someone mentioned upthread is probably the best practical mitigation right now. but even that breaks down when the agent legitimately needs broad access to do its job - like if its refactoring across a monorepo it kinda needs write access to everything.
i think the actual answer long term is something closer to capability-based security where each tool call gets its own token scoped to exactly what that specific action needs. but nobody has built that yet in a way that doesnt make the agent 10x slower.
Comment by eelke 1 day ago
Comment by brap 1 day ago
1. Destructive by accident 2. Destructive because it was prompt-injected
And
1. Fucks up filesystem 2. Fucks up external systems via credentials
Comment by pash 1 day ago
Basically, give an agent its own unprivileged user account (interacting with it via sudo, SSH, and shared directories), then add sandbox-exe on top for finer-grained control of access to system resources.
Comment by mikemcquaid 1 day ago
I also found the author to be helpful and responsive and the tool to be nicely minimalistic rather than the usual vibe coded ever expanding mess.
‘brew install sandvault’ and running ‘sv’ should get you going.
(full disclosure: I created the Homebrew formula and submitted a few PRs to the project)
Comment by TheTaytay 1 day ago
Comment by mkagenius 1 day ago
$ container system start
$ container run -d --name myubuntu ubuntu:latest sleep infinity
$ container exec myubuntu bash -c "apt-get update -qq && apt-get install -y openssh-server"
$ container exec myubuntu bash -c "
apt-get install -y curl &&
curl -fsSL https://deb.nodesource.com/setup_lts.x |
bash - &&
apt-get install -y nodejs
"
$ container exec myubuntu npm install -g @anthropic-ai/claude-code
$ container exec myubuntu claude --versionComment by emmelaich 1 day ago
Comment by terhechte 1 day ago
Comment by sunnybeetroot 1 day ago
Comment by varenc 1 day ago
Its manpage has been saying it's deprecated for a decade now, yet we're continuing to find great uses for it. And the 'App Sandbox' replacement doesn't work at all for use cases like this where end users define their own sandbox rules. Hope Apple sees this usage and stops any plans to actually deprecate sandbox-exec. I recall a bunch of macOS internal services also rely on it.
Comment by jasomill 1 day ago
In particular, has the profile language ever been documented by anything other than the examples used by the OS and third parties reverse engineering it?
Comment by varenc 5 hours ago
Comment by davidcann 1 day ago
Comment by alpb 1 day ago
Comment by crossroadsguy 1 day ago
Solution could be two parts:
OS bringing better and easier to use OS limitations (more granular permissions; install time options and defaults which will be visible to user right there and user can reject that with choices like:
- “ask later”
- “no”
- “fuck no”
with eli5 level GUIs (and well documented). Hell, a lot of these are already solved for mobile OS. While not taking away tools away from hands of the user who wants to go inside and open things up (with clear intention and effort; without having to notarise some shit or pay someone).
2. Then apps[1] having to, forced to, adhere to use those or never getting installed.
[1] So no treating of agents as some “other” kinds of apps. Just limit it for every app (unless user explicitly decides to open things up).
It will also be a great time to nuke the despicable mess like Electron Helpers and shit and app devs considering it completely fine to install a trillion other “things” when user installed just one app without explaining it in the beginning (and hence forced to keep their apps’ tentacles simple and limited)
Comment by tl2do 1 day ago
Around last summer (July–August 2025), I desperately needed a sandbox like this. I had multiple disasters with Claude Code and other early AI models. The worst was when Claude Code did a hard git revert to restore a single file, which wiped out ~1000 lines of development work across multiple files.
But now, as of March 2026, at least in my experience, agents have become more reliable. With proper guardrails in claude.md and built-in safety measures, I haven't had a major incident in about 3 months.
That said, layering multiple safeguards is always recommended—your software assets are your assets. I'd still recommend using something like this. But things are changing, bit by bit.
Comment by e1g 1 day ago
Also, I don’t want it to be even theoretically possible for some file in node_modules to inject instructions to send my dotfiles to China.
Comment by jeremyjh 1 day ago
Comment by ramoz 1 day ago
Comment by bilalq 1 day ago
Comment by ZYbCRq22HbJ2y7 1 day ago
Comment by synparb 1 day ago
Comment by webpolis 23 hours ago
We built Cyqle (https://cyqle.in) partly around this idea — each session is a full Linux desktop that's cryptographically wiped on close (AES-256 key destroyed, data unrecoverable). Agents can do whatever they want inside, and the blast radius is zero by design. No residual state, no host OS exposure.
The tradeoff is latency and connectivity requirements. For teams already doing cloud-based dev work, it's a natural fit. For local-first workflows where you need offline capability or sub-50ms responsiveness, something like Agent Safehouse makes more sense.
Both approaches are worth having — the threat model differs depending on whether you're more worried about data exfiltration or local system compromise.
Comment by agent5ravi 1 day ago
Comment by SiteMgrAI 1 day ago
Comment by garganzol 2 days ago
Having real macOS Docker would solve the problem this project solves, and 1001 other problems.
Comment by mkagenius 1 day ago
You can use it to completely sandbox claude code too.
1. Coderunner - https://github.com/instavm/coderunner
Comment by dpe82 1 day ago
Comment by PlasmaPower 1 day ago
Comment by dpe82 1 day ago
But too many people just automatically equate docker with strong secure isolation and... well, it can be, sometimes, depending a hundred other variables. Thus the reminder; to foster conversations like this.
Comment by fredoliveira 1 day ago
Comment by egorfine 1 day ago
I'm very slowly working on a mock docker implementation for macOS that uses ephemeral VM to launch a true guest macOS and perform commands as per Dockerfile/copies files/etc. I use it internally for builds. No public repo yet though. Not sure if there is demand.
Comment by hrmtst93837 1 day ago
A practical path is ephemeral macOS VMs using Apple's Virtualization.framework coupled with APFS copy-on-write clones for fast provisioning, or limited per-process isolation via seatbelt and the hardened runtime, which respects Apple's licensing that restricts macOS VMs to Apple hardware and gives strong isolation at the cost of higher RAM and storage overhead compared with Linux containers.
Comment by PufPufPuf 1 day ago
Comment by garganzol 1 day ago
It's not a pleasure to run them in a mutable environment where everything has a floating state as I do now. Native Docker for macOS would totally solve that.
Comment by qalmakka 1 day ago
What would a Phillips screwdriver bring over a flathead screwdriver? Sometimes you don't want/need the flathead screwdriver, simple as that. There are macOS-specific jobs you need to run in macOS, such as xcode toolchains etc. You can try cross compiling, but it's a pain and ridiculous given that 100% of every other OS supports containers natively (including windows). It's clear to me that Apple is trying to make the ratio jobs/#MacMinis as small as possible
Comment by hirvi74 1 day ago
Comment by hkonte 1 day ago
One thing I'd add: sandboxing the execution environment only solves half the problem. The other half is the prompt itself — if the agent's instructions are ambiguous or poorly scoped, sandboxing just contains the damage from a confused agent rather than preventing it.
I built flompt (https://flompt.dev) to address the instruction side — a visual prompt builder that decomposes agent prompts into 12 semantic blocks (role, constraints, objective, output format, etc.) and compiles them to Claude-optimized XML. Tight instructions + sandboxed execution = actually safe agents.
Comment by hsaliak 1 day ago
https://github.com/hsaliak/sacre_bleu very rough around the edges, but it works. In the past there were apps that either behaved well, or had malicious intent, but with these LLM backed apps, you are going to see apps that want to behave well, but cannot guarantee it. We are going to see a lot of experimentation in this space until the UX settles!
Comment by ClaudioAnthrop 1 day ago
Comment by carderne 1 day ago
For those using pi, I've built something similar[1] that works on macOS+Linux, using sandbox-exec/bubblewrap. Only benefit over OP is that there's some UX for temporarilily/permanently bypassing blocks.
Comment by e1g 1 day ago
Big love for Pi - it was the first integration I added to Safehouse. I wanted something that offers strong guarantees across all agents (I test and write them nonstop), has no dependencies (e.g., the Node runtime), and is easy to customize, so I didn't use the Anthropic sandbox-runtime.
Comment by carderne 1 day ago
Yeah I think for general use the transparency of what your thing does is really great compared to a pile of TypeScript and whatnot.
Comment by gbrindisi 1 day ago
Code here: https://github.com/gbrindisi/agentbox
Comment by w10-1 1 day ago
If/since AI agents work continuously, it seems like running macOS in a VM (via the virtualization framework directly) is the most secure solution and requires a lot less verification than any sandboxing script. (Critical feature: no access to my keychain.)
AI agents are not at all like container deploys which come and go with sub-second speed, and need to be small enough that you can run many at a time. (If you're running local inference, that's the primary resource hog.)
I'm not too worried about multiple agents in the same vm stepping on each other. I give them different work-trees or directory trees; if they step over 1% of the time, it's not a risk to the bare-metal system.
Not sure if I'm missing something...
Comment by sunnybeetroot 1 day ago
Comment by llimllib 1 day ago
Comment by devrimozcay 1 day ago
One thing we've been seeing with production AI agents is that the real risk isn't just filesystem access, but the chain of actions agents can take once they have tool access.
Even a simple log-reading capability can escalate if the agent starts triggering automated workflows or calling internal APIs.
We've been experimenting with incident-aware agents that detect abnormal behavior and automatically generate incident reports with suggested fixes.
Curious if you're thinking about integrating behavioral monitoring or anomaly detection on top of the sandbox layer.
Comment by matifali 1 day ago
Comment by NegativeLatency 1 day ago
I’m assuming it’s similar to why people run plex, web servers, file sharing, etc
Also personally I’d rather not pay monthly fees for stuff if it can be avoided.
Comment by paxys 1 day ago
Comment by mikodin 1 day ago
Comment by deevus 1 day ago
It supports running on a TrueNAS SCALE server, or via Incus (local or remote). I'm still working on tightening the security posture, but for many types of AI workflows it will be more than sufficient.
Comment by sunnybeetroot 1 day ago
Comment by Tadbitrusty 1 day ago
Comment by abhisek 1 day ago
However the challenge is, sandbox profiles (rules) are always workload specific. How do you define “least privilege” for a workload and then enforce it through the sandbox.
Which is why general sandboxes wont be useful or even feasible. The value is observing and probably auto-generating baseline policy for a given workload.
Wrong or overly relaxed policies would make sandbox ineffective against real threats it is expected to protect against.
Comment by srid 1 day ago
Comment by sunir 1 day ago
p.s. thanks for making this; timely as I am playing whackamole with sandboxing right now.
Comment by e1g 1 day ago
Comment by brutuscat 1 day ago
Comment by mlysk 1 day ago
Comment by inoki 1 day ago
Comment by e1g 1 day ago
Comment by Finbarr 1 day ago
I built yolobox to solve this using docker/apple containers: https://github.com/finbarr/yolobox
Comment by jeff_antseed 1 day ago
been watching microsandbox but its pretty early. landlock is the linux kernel primitive that could theoretically enable something like this but nobody's built the nice policy layer on top yet.
curious if anyone has a good solution for the "agent running on a remote linux server" case. the threat model is a bit different anyway (no iMessage/keychain to protect) but filesystem and network containment still matter a lot
Comment by carderne 1 day ago
[1] https://github.com/anthropic-experimental/sandbox-runtime [2] https://github.com/carderne/sandbox-runtime
Comment by edf13 1 day ago
Taking more of an automated supervisor approach with limited manual approval for edge cases.
Grith.ai
Comment by guimbuilds 1 day ago
Comment by kxrm 1 day ago
I use clippy with rust and the only thing I had to add was:
(subpath "/Library/Developer/CommandLineTools")Comment by andai 1 day ago
Then I realized the only thing I care about on my local machine is "don't touch my files", and Unix users solved that in 1970. So I just run agents as "agent" user.
I think running it on a separate machine is nicer though, because it's even simpler and safer than that. (My solution still requires careful setup and regular overhead when you get permission issues. "It's on another laptop, and my stuff isn't" has neither of those problems.)
Comment by rwky 1 day ago
Comment by kevincloudsec 18 hours ago
Comment by datapolitical 1 day ago
But given how fast agents are moving, I would be shocked if such tools were not already being built
Comment by ashishb 1 day ago
Comment by cuber_messenger 1 day ago
Comment by ashniu123 1 day ago
Comment by rishabhaiover 1 day ago
Comment by gozucito 2 days ago
Comment by e1g 1 day ago
Comment by arianvanp 1 day ago
Comment by grun 1 day ago
Comment by dbmikus 1 day ago
How does this compare with Codex's and Claude's built-in sandboxing?
Comment by e1g 1 day ago
Codex: IIRC, only shell commands are sandboxed; the actual agent runtime is not.
Comment by dbmikus 1 day ago
Comment by wek 1 day ago
Comment by boxedemp 1 day ago
Comment by snthpy 7 hours ago
Comment by treexs 1 day ago
Comment by e1g 1 day ago
The alternative would be “no site”, which is still somehow worse.
Comment by sagarpatil 1 day ago
Comment by vivid242 1 day ago
Comment by cowpig 1 day ago
I'm involved with a project building something very similar, which we literally open sourced an alpha version of last week:
https://github.com/GreyhavenHQ/greywall
It's a bit different in that:
- We started with Linux
- It is a binary that wraps the agent runtime
- It runs alongside a proxy which captures all traffic to provide a visibility layer
- Rules can be changed dynamically at runtime
I am so happy this problem is getting the attention it deserves!
Comment by m3kw9 1 day ago
Comment by croes 1 day ago
Why always the fixation on the hardware?
Comment by cjbarber 1 day ago
Comment by ai_fry_ur_brain 1 day ago
Comment by nemo44x 1 day ago
All the issues we get from AI today (hallucinations, goal shift, context decay, etc) get amplified unbelievably fast once you begin scaling agents out due to cascading. The risk being you go to bed and when you wake up your entire infrastructure is gone lol.
Comment by BLACKCRAB 49 minutes ago
Comment by CloakHQ 1 day ago
Comment by oliver_dr 1 day ago
Comment by babbagegao 1 day ago
Comment by octoclaw 1 day ago
Comment by bhekanik 1 day ago
Comment by babbagegao 1 day ago
Comment by yowang 1 day ago
Comment by rex_claw 1 day ago
Comment by babbagegao 1 day ago
Comment by maciver 1 day ago
Comment by naomi_kynes 1 day ago
Most setups handle this awkwardly: fire a webhook, write to a log, hope the human is watching. The sandbox keeps the agent contained, but doesn't give it a clean "pause and ask" primitive. The agent either guesses (risky) or silently fails (frustrating).
Seems like there are two layers: the security boundary (sandbox-exec, containers, etc.) and the communication boundary (how does a contained agent reach the human?). This project nails the first. The second is still awkward for most setups.
Comment by niyikiza 1 day ago
Comment by e1g 1 day ago
Comment by jamiemallers 1 day ago
Comment by 10keane 1 day ago
Comment by moehj 1 day ago
Comment by aplomb1026 1 day ago
Comment by Agent_Builder 1 day ago
Comment by openclaw01 1 day ago
Comment by devonkelley 1 day ago
Comment by poopiokaka 1 day ago
Comment by bschmidt97979 1 day ago
Comment by gnanagurusrgs 1 day ago
sandbox-profiles is a solid primitive for local agents. The missing piece in production is the tool layer — even a sandboxed agent can still make dangerous API calls if the MCP tools it has access to aren't individually authed and scoped.
The real stack is: sandbox the runtime (what Agent Safehouse does) + scope the tools (what we do with JIT OAuth at the MCP layer). Neither alone is enough.
Nice work shipping this.
https://www.arcade.dev/blog/ai-agent-auth-challenges-develop...