Ask HN: What are you using to mitigate prompt injection?
Posted by ramoz 3 hours ago
If anything at all.
Comments
Comment by oliver_dr 2 hours ago
Input-side (preventing injection):
- Strict input sanitization with role-boundary enforcement in the system prompt. Sounds basic, but most people skip it.
- Separate "user content" from "system instructions" at the API level. Don't concatenate untrusted input into your system prompt. Use the dedicated `user` role in the messages array.
- For tool-calling agents, validate that tool arguments match expected schemas before execution. An LLM-as-judge approach for tool call safety is expensive but effective for high-stakes actions.
Output-side (catching when injection succeeds):
This is the part most people underinvest in. Even with perfect input filtering, you still need output guardrails:
- Run the LLM output through evaluation metrics that score for factual correctness, instruction adherence, and safety before it reaches the user.
- For RAG systems specifically, verify that the generated answer is actually grounded in the retrieved context, not fabricated or influenced by injected instructions.
The "defense in depth" framing matters here. Input filtering alone has a ceiling because adversarial prompts evolve faster than regex rules. Output evaluation catches the failures that slip through. We use DeepRails' Defend API for this layer - it scores outputs on correctness, completeness, and safety, then auto-remediates failures before they reach end users. But the principle applies regardless of tooling: treat output verification as a first-class concern, not an afterthought.
Simon Willison's work on dual-LLM patterns is also worth reading if you haven't: https://simonwillison.net/2023/Apr/25/dual-llm-pattern/