Ask HN: How do you authorize AI agent actions in production?

Posted by naolbeyene 2 days ago

Counter5Comment5OpenOriginal

I'm deploying AI agents that can call external APIs – process refunds, send emails, modify databases. The agent decides what to do based on user input and LLM reasoning.

My concern: the agent sometimes attempts actions it shouldn't, and there's no clear audit trail of what it did or why.

Current options I see: 1. Trust the agent fully (scary) 2. Manual review of every action (defeats automation) 3. Some kind of permission/approval layer (does this exist?)

For those running AI agents in production: - How do you limit what the agent CAN do? - Do you require approval for high-risk operations? - How do you audit what happened after the fact?

Curious what patterns have worked.

Comments

Comment by kxbnb 1 hour ago

The middleware proxy approach unop mentioned is the right pattern - you need an enforcement point the agent can't bypass.

At keypost.ai we're building exactly this for MCP pipelines. The proxy evaluates every tool call against deterministic policy before it reaches the actual tool. No LLM in the decision path, so no reasoning around the rules.

Re: chrisjj's point about "fix the program" - the challenge is that agents are non-deterministic by design. You can't unit test every possible action sequence. So you need runtime enforcement as a safety net, similar to how we use IAM even for well-tested code.

The human-in-the-loop suggestion works but doesn't scale. What we're seeing teams want is conditional human approval - only trigger review when the action crosses a risk threshold (first time deleting in prod, spend over $X, etc.), not for every call.

The audit trail gap is real. Most teams log tool calls but not the policy decision. When something goes wrong, you want to know: was it allowed by policy, or did policy not cover this case?

Comment by unop 1 day ago

Tool calls with middleware. If you deploy an agent into a production system - you design it to use a set of curated whitelisted of bespoke tool calls against services in your stack.

Also, You should never connect an agent directly to a sensitive database server or an order/fulfillment system, etc. Rather, you'd use "middleware proxy" to arbitrate the requests, consult with a policy engine, log processing context, etc before relaying the requests on to the target system.

Also consider subtleties in the threat model and types of attack vector. how many systems the agent(s) connect to concurrently. See the lethal trifecta https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

Comment by chrisjj 2 days ago

If one asked the same about any other kind program that was known to be likely to produce incorrect and damaging output, the answer would be obvious. Fix the program.

It is instructive to consider why the same does not apply in this case.

And see https://www.schneier.com/blog/archives/2026/01/why-ai-keeps-... .

Comment by throw03172019 2 days ago

Human in the loop for certain actions.

Comment by chrisjj 2 days ago

But how do you get the bot to comply?