Is grep all you need? How agent harnesses reshape agentic search
Posted by Anon84 8 days ago
Comments
Comment by softwaredoug 7 days ago
If you think grep is great, it’s because you’ve been social engineered to organize your content to be findable. We document why something is useful to an agent. We put it in a logical place.
Just organizing content is at least half of building search, agentic or not. It’s one reason Google is successful, we’re all trying to make our content findable by the search engine. It’s not all technology :)
Comment by cpburns2009 7 days ago
This is such a strange train of thought. How do did you get there?
Comment by softwaredoug 7 days ago
Incentives to make things findable is more important to search than any technology.
Comment by nh23423fefe 7 days ago
so if i just index and search then i can stop writing like that?
Comment by allan_s 7 days ago
Comment by piekvorst 7 days ago
Lines are a fundamental building block of text and it’s not unreasonable to optimize them.
)
Comment by giancarlostoro 7 days ago
Comment by quinncom 7 days ago
> We evaluate on a 116-question representative subset of the LongMemEval benchmark (Wu et al., 2025), which tests an agent’s ability to answer questions over long conversations spanning multiple sessions.
Comment by schipperai 7 days ago
Comment by alexrigler 7 days ago
Comment by piekvorst 7 days ago
grep’s design is surprisingly winning, exceeding expectations to this day.
Comment by weaksauce 7 days ago
pretty fast and neat project to search code interactively with a lot of optimizations on finding the right thing
Comment by piekvorst 3 days ago
This is a promising road that I would probably not take. I have learned to live with simple per-line regular expressions. I have never felt that they slow me down.
In fact, the opposite is true: they let me craft fuzzy queries clearly, i.e., to balance the fuzziness across the query. I’ve never learned to do that with the black-box intelligent queries, which severely limited my scope in the past.
Comment by boyter 7 days ago
Comment by contextfree 7 days ago
Comment by bee_rider 7 days ago
Comment by inetknght 7 days ago
It's best not to use Microsoft products.
Comment by contextfree 7 days ago
Comment by yndoendo 6 days ago
Microsoft had to replace _powershell_ with _pwsh_ because of the anti-consumer aliasing they did. My powershell profile is full of all the commands to remove those aliases.
Last time I check, Microsoft even creates a python alias to bring up their store vs calling the exe in your defined path that was manually installed.
Comment by SkitterKherpi 7 days ago
Comment by pipeline_peak 7 days ago
Compilations break all the time and those symbols either become useless or it’s just quicker to use grep.
Comment by hmokiguess 8 days ago
Comment by sdesol 8 days ago
https://github.com/gitsense/gsc-cli
`gsc grep` is just an alias for `gsc rg`, mostly because agents are much more likely to reach for “grep” than “rg”.
It works pretty well, but it is not a perfect drop-in replacement. `grep` and `ripgrep` differ in a few details, especially around glob/wildcard behaviour and flags. What I found works is to not use `grep` in search examples, and have the CLI spit out an error message for the AI saying this is `ripgrep`, so it needs to use `ripgrep` syntax.
Comment by cyanydeez 7 days ago
Comment by fwip 7 days ago
Comment by verdverm 8 days ago
It depends on if it is using Grep the harness tool or Grep from the bash tool
Comment by hmokiguess 8 days ago
Comment by joelfried 8 days ago
If you'd told me a decade ago I'd finally learn some sed in 26 because I'd want to understand what the AI was doing I'd have told you you were crazy . . .
Comment by Analemma_ 7 days ago
Comment by celrod 8 days ago
https://github.com/Genivia/ugrep#aliases
Claude Code may ship with ugrep already.
Comment by gbacon 8 days ago
Comment by sdesol 7 days ago
Comment by jeffchuber 8 days ago
- regex (grep) - hybrid search (bm25+vector)
this X vs Y is uninteresting when the answer can be both.
Comment by pastel8739 7 days ago
Comment by fnordpiglet 7 days ago
Comment by mediaman 7 days ago
I agree it's very frustrating to use with custom tools/harnesses that can speed up the process for domain specific purposes.
Comment by bachittle 7 days ago
Comment by budududuroiu 7 days ago
Comment by dominotw 7 days ago
Comment by jeffchuber 7 days ago
Comment by worthless-trash 7 days ago
Comment by sdesol 7 days ago
What do you mean by this? Do you mean not automatically build the index?
Comment by worthless-trash 7 days ago
Comment by worthless-trash 7 days ago
Comment by piker 8 days ago
I wrote about it[1] and came away with a different view on both Palantir and the future of agentic workflows personally.
[1] sorry, LinkedIn: https://www.linkedin.com/pulse/fund-managements-killer-app-d...
Comment by darkteflon 7 days ago
> But it would make no sense to have an LLM regurgitate an existing form document token-by-token rather than call a piece of 1994 software like Hotdocs to populate some placeholders.
This is a real “oof”, isn’t it. Very difficult to understand what they were going for here. Perhaps they just assumed no one in the intended audience would pick it up. But it certainly is enough of a red flag that it made me go back to the top of your write-up for a re-read, thinking about their whole pipeline in much more sceptical terms.
Comment by piker 7 days ago
Edit: looks like you’re in London, too. Hit me up and let’s connect. My details are in the bio!
Comment by darkteflon 7 days ago
Comment by SkyPuncher 7 days ago
So far every Grep vs RAG discussion I've seen conflates overlapping factors. The most common is simply that a company rebuilt their pipeline from scratch and fixed a bunch of problems. The worst is when they go from one-shot RAG to multi-step Grep and completely miss the fact that multi-step RAG would likely get them similar results.
At the end of the day, the most important thing is knowing the _product features_ your users care about and making sure that's represented in the pipeline.
Comment by ako 7 days ago
Comment by krzyk 7 days ago
Comment by moljac024 7 days ago
Comment by stephantul 7 days ago
Comment by 0xbadcafebee 7 days ago
And a lot more tokens, and slower speed. Yes you can get more accuracy if you suck tons more data into context.
But compare this to more advanced code agent methods like Tree Sitter, PageRank, LSP, that build semantic maps to provide more relevant context. Grep alone can't do that
Comment by greenavocado 7 days ago
Comment by nibbleyou 7 days ago
Comment by liminal 7 days ago
Comment by yodon 8 days ago
Comment by verdverm 8 days ago
I'm currently working on a markdown kb / search tool for my agents, in part built on TS
Comment by yanhangyhy 7 days ago
Comment by yetanotherjosh 7 days ago
> LongMemEval rewards recovering literal witnesses: exact dates, counts, preferences, and spans that often remain stable under tokenization.
Is this saying they chose a benchmark that is biased towards doing well against literal string matching, thus works well with grep, and then (gasp) showed that grep did well, finally declaring "grep is all you need"?
The examples in the benchmark's demo image(1) are all examples you could see grep doing well on. A conversation about bikes, then a query about bike(s) where "bike" is a common token hit. But not stuff like a conversation about a Beethoven sonata, then a question about classical music, where embedding based approach would shine.
(1) https://github.com/xiaowu0162/LongMemEval/blob/main/assets/l...
Comment by kwillets 7 days ago
Comment by _pdp_ 7 days ago
Comment by sys_64738 8 days ago
Comment by eliaslumer26 7 days ago
Comment by KaiShips 7 days ago
Comment by sdesol 8 days ago
Comment by tailor_gunjan93 7 days ago
Comment by Pranavsingh431 7 days ago
Comment by ashishdhiman23 7 days ago
Comment by gauravvij137 7 days ago
Comment by wseadowntown 8 days ago