WarpGrep – RL Subagent for Fast Context (Like SWE-Grep)
Posted by bhaktatejas922 16 hours ago
Comments
Comment by bhaktatejas922 16 hours ago
We’re the team behind WarpGrep. It’s a FAST context retrieval subagent designed to fix coding agents spending ~60% of their time searching for context + the huge context rot problem.
We built this because we found that standard RAG or naive context stuffing leads to "context rot"—where irrelevant files poison the model’s reasoning on long-horizon tasks. Inspired by Cognition’s SWE-Grep, we wanted to build an accessible version that integrates via MCP (Model Context Protocol) or SDK.
How it works: Instead of a single prompt trying to do everything, WarpGrep treats context retrieval as a distinct, RL-trained system. We reward correct context retrieval and penalize irrelevant lines.
Constraints: It operates on a strict budget of 4 turns. Parallelism: It executes up to 8 parallel tool calls per turn (grep, list, read, etc.).
Inference: We worked with NVIDIA to optimize this on B200s. We are hitting ~900 tokens/sec (compared to SWE-Grep’s ~650 t/s). The heavy prefill optimization was critical here because grep operations are read-heavy.
The Results: In our internal benchmarks, offloading retrieval to this subagent speeds up tasks by 40% and reduces token usage by roughly the same amount. More importantly, it seems to reduce "context rot" by ~70% on longer tasks because the agent isn't distracted by irrelevant file headers. On SWE-Bench Pro we see 5-12% improvement on long horizon tasks and stable chats for 2-3x more user messages.
It works with every coding agent - Claude Code, Codex, and OpenCode. We’re curious to see how it handles your edge cases (especially huge repos).
There is a free tier, but if you want to push it hard, you can use the code BF16 for 40M tokens of credit to test the API limits. We do recommend adding a payment method to get around the rate limits but you won't be charged until December 14th. At which point it will still be almost 10x cheaper than Claude Haiku.
Happy to answer questions about the CUDA optimizations or the RL training process!