The Agentic AI Handbook: Production-Ready Patterns
Posted by SouravInsights 4 days ago
Comments
Comment by alkonaut 3 days ago
Like if I'm not ready to jump on some AI-spiced up special IDE, am I then going to just be left banging rocks together? It feels like some of these AI agent companies just decided "Ok we can't adopt this into the old IDE's so we'll build a new special IDE"?_Or did I just use the wrong tools (I use Rider and VS, and I have only tried Copilot so far, but feel the "agent mode" of Copilot in those IDE's is basically useless).
Comment by prettygood 3 days ago
Comment by kace91 3 days ago
If you read someone say “I don’t know what’s the big deal with vim, I ran it and pressed some keys and it didn’t write text at all” they’d be mocked for it.
But with these tools there seems to be an attitude of “if I don’t get results straight away it’s bad”. Why the difference?
Comment by Macha 3 days ago
Comment by kace91 3 days ago
Comment by dist-epoch 3 days ago
Comment by Macha 3 days ago
It never broke into the workplace like measuring AI use among your employees. Nobody's asked me about how I've used vim keybinds to improve the company's growth in a performance review.
Comment by alkonaut 3 days ago
I get the same change applied multiple times, the agent having some absurd method of applying changes that conflict with what I say it like some git merge from hell and so on. I can't get it to understand even the simplest of contexts etc.
It's not really that the code it writes might not work. I just can't get past the actual tool use. In fact, I don't think I'm even at the stage where the AI output is even the problem yet.
Comment by kace91 3 days ago
>I get the same change applied multiple times, the agent having some absurd method of applying changes that conflict with what I say it like some git merge from hell and so on. I can't get it to understand even the simplest of contexts etc.
That is weird. results have a ton of variation, but not that much.
Say you get a claude subscription, point it to a relatively self contained file in your project, hand it the command to run relevant tests, and tell it to find quick win refactoring opportunities, making sure that the business outcome of the tests is maintained even if mocks need to change.
You should get relevant suggestions for refactoring, you should be able to have the changes applied reasonably, you should have the tests passing after some iterations of running and fixing by itself. At most you might need to check that it doesn't cheat by getting a false positive in a test or something similar.
Is such an exercise not working for you? I'm genuinely curious.
Comment by TeMPOraL 3 days ago
Sure it can, because nobody is reading manuals anymore :).
It's an interesting exercise to try: take your favorite tool you use often (that isn't some recent webshit, devoid of any documentation), find a manual (not a man page), and read it cover to cover. Say, GDB or Emacs or even coreutils. It's surprising just how much powerful features good software tools have, and how much you'll learn in short time, that most software people don't know is possible (or worse, decry as "too much complexity") just because they couldn't be arsed to read some documentation.
> I just can't get past the actual tool use. In fact, I don't think I'm even at the stage where the AI output is even the problem yet.
The tools are a problem because they're new and a moving target. They're both dead simple and somehow complex around the edges. AI, too, is tricky to work, particularly when people aren't used to communicating clearly. There's a lot of surprising problems (such as "absurd method of applying changes") that come from the fact that AI is solving a very broad class of problems, everywhere at the same time, by virtue of being a general tool. Still needs a bit of and-holding if your project/conventions stray away from what's obvious or popular in particular domain. But it's getting easier and easier as months go by.
FWIW, I too haven't developed a proper agentic workflow with CLI tools for myself just yet; depending on the project, I either get stellar results or garbage. But I recognize this is only a matter of time investment: I didn't have much time to set aside and do it properly.
Comment by neumann 3 days ago
Comment by dist-epoch 3 days ago
Comment by galaxyLogic 3 days ago
AI is supposed to make our work easier.
Comment by Nekobai 3 days ago
Comment by galaxyLogic 3 days ago
With VIM or Emacs I am supposed to know what Ctrl-X does. But with AI tools (ideally) I should be able to ask AI (in English) to edit the document for me?
Maybe the reason we can't do it that way is that, "We're not there yet"?
Comment by kace91 3 days ago
Comment by walt_grata 3 days ago
Comment by chewz 3 days ago
Comment by embedding-shape 3 days ago
I feel like that matters more than the tooling at this point.
I can't really understand letting LLMs decide what to test or not, they seem to completely miss the boat when it comes to testing. Half of them are useless because they duplicate what they test, and the other half doesn't test what they should be testing. So many shortcuts, and LLMs require A LOT of hand-holding when writing tests, more so than other code I'd wager.
Comment by Balinares 3 days ago
Comment by embedding-shape 3 days ago
We haven't figured out a way for humans to do that well :P I still see people arguing about "80% test coverage is obviously better than 70%" and similar dumb sentiments that completely misses the point.
But agree with the first part, LLMs are massively oversold and it's hard to blame users for believing them. Tempered expectations as always win.
Comment by prettygood 3 days ago
Comment by embedding-shape 3 days ago
Comment by threecheese 3 days ago
Comment by tasuki 3 days ago
I think so. The humans should be writing the spec. The AI can then (try to) make the tests pass.
Comment by sixtyj 3 days ago
LLMs just fail (hallucinate) in less known fields of expertise.
Funny: Today I have asked Claude to give me syntax how to run Claude Code. And its answer was totally wrong :) So you go to documentation… and its parts are obsolete as well.
LLM development is in style “move fast and break things”.
So in few years there will be so many repos with gibberish code because “everybody is coder now” even basketball players or taxi drivers (no offense, ofc, just an example).
It is like giving F1 car to me :)
Comment by agumonkey 3 days ago
Comment by CurleighBraces 3 days ago
There's obviously a whole heap of hype to cut through here, but there is real value to be had.
For example yesterday I had a bug where my embedded device was hard crashing when I called reset. We narrowed it down to the tool we used to flash the code.
I downloaded the repository, jumped into codex, explained the symptoms and it found and fixed the bug in less than ten minutes.
There is absolutely no way I'd of been able to achieve that speed of resolution myself.
Comment by Bewelge 3 days ago
- I downloaded the repository, jumped into codex, explained the symptoms and it found and fixed the bug in less than ten minutes.
Change the second step to: - I downloaded the repository, explained the symptoms, copied the relevant files into Claude Web and 10 minutes later it had provided me with the solution to the bug.
Now I definitely see the ergonomic improvement of Claude running directly in your directory, saving you copy/paste twice. But in my experience the hard parts are explaining the symptoms and deciding what goes into the context.
And let's face it, in both scenarios you fixed a bug in 10-15 minutes which might have taken you a whole hour/day/week before. It's safe to say that LLMs are an incredible technological advancement. But the discussion about tooling feels like vim vs emacs vs IDEs. Maybe you save a few minutes with one tool over the other, but that saving is often blown out of proportion. The speedup I gain from LLMs (on some tasks) is incredible. But it's certainly not due to the interface I use them in.
Also I do believe LLM/agent integrations in your IDE are the obvious future. But the current implementations still add enough friction that I don't use them as daily drivers.
Comment by CurleighBraces 3 days ago
Once I started working this way however, I found myself starting to adapt to it.
It's not unusual now to find myself with at least a couple of simultaneous coding sessions, which I couldn't see myself doing with the friction that using Claude Web/Codex web provides.
I also entirely agree that there's going to be a lot of innovation here.
IDEs imo will change to become increasingly focused on reading/reviewing code rather than writing, and in fact might look entirely different.
Comment by Bewelge 3 days ago
I envy you for that. I'm not there yet. I also notice that actually writing the code helps me think through problems and now I sometimes struggle because you have to formulate problems up front. Still have some brain rewiring to do :)
Comment by CurleighBraces 3 days ago
"I can literally feel competence draining out of my fingers"
Comment by theshrike79 2 days ago
My daily process is like this:
Claude plans (Opus 4.5)
Claude implements (Opus at work, Sonnet at home - I only have the $20 plan personally :P )
After implementation the relevant files are staged
Then I start a codex tab, tell it to review the changes in the staged files
I read through the review, if it seems valid or has critical issues ->
Clear context on Claude, give it the review and ask it to evaluate if it's valid.
Contemplate on the diff of both responses (Codex is sometimes a bit pedantic or doesn't get the wider context of things) and tell Claude what to fix
If I'm at home and Claude's quota is full, I use ampcode's free tier to implement the fix.
Comment by embedding-shape 3 days ago
What exactly do you mean with "integrating agents" and what did you try?
The simplest (and what I do) is not "integrating them" anywhere, but just replace the "copy-paste code + write prompt + copy output to code" with "write prompt > agent reads code > agent changes code > I review and accept/reject". Not really "integration" as much as just a workflow change.
Comment by alkonaut 3 days ago
I don't really get how the workflow is supposed to work, but I think it's mostly due to how the tool is made. It has like some sort of "change stack" similar to git commits/staging but which keeps conflicting with anything I manually edit.
Perhaps it's just this particular implementation (Copilot integration in VS) which is bad, and others are better? I have extreme trouble trying to feed it context, handling suggested AI changes without completely corrupting the code for even small changes.
Comment by kaycey2022 3 days ago
Comment by embedding-shape 3 days ago
The workflow I have right now, is something like what I put before, and I do it with Codex and Claude Code, both work the same. Maybe try out one of those, if you're comfortable with the terminal? It basically opens up a terminal UI, can read current files, you enter a prompt, wait, then can review the results with git or whatever VCS you use.
But I'm also never "vibe-coding", I'm reviewing every single line, and mercilessly ask the agent to refactor whenever the code isn't up to my standards. Also restart the agent after each prompt finished, as they get really dumb as soon as context is used more than 20% of their "max".
Comment by theshrike79 2 days ago
Give the agent tools to determine whether code is up to your standards, an executable or script it can run that checks for code style and quality. This way it won't stop the agent loop until the checks pass - saving you time.
Comment by ikidd 3 days ago
That's been my experience. You have to work them up to the big ask.
Comment by songodongo 3 days ago
Comment by ctmnt 3 days ago
Try one of the CLIs. That’s the good stuff right now. Claude Code (or similar) in your shell, don’t worry about agentic patterns, skills, MCP, orchestrators, etc etc. Just the CLI is plenty.
Comment by ikidd 3 days ago
Splurge on the $20 for Cursor, and install their IDE. Start with a simple project, more because it helps you see how it works than because Cursor can't handle more. Give it specific instruction and not too big a problem at one time so you can tailor the prompt. If it's niche, consider changing the model to Opus4.5 long enough for it to get a handle on the codebase. Use the Plan mode to start, adjust the plan, then let it god. Every time it makes changes it can be reverted to the state at previous prompts. Use git liberally.
I'm just a dumb farmer who quit programming 20 years ago, and I use it to build stuff that works IRL for my operation constantly. A dev should be able to wrap their head around it.
Comment by hahahahhaah 3 days ago
It is like learning to code itself. You need flight hours.
Comment by cobolexpert 3 days ago
Comment by vidarh 3 days ago
How much more depends on what you're trying to do and in what language (e.g. "favourite" pet peeve: Claude occasionally likes to use instance_variable_get() in Ruby instead of adding accessors; it's a massive code smell), but there are some generic things, such as giving it instructions on keeping notes and giving them subagents to farm out repetitive tasks to prevent the individual task completion from filling up the context for tasks that are truly independent (in which case, for Claude Code at least, you can also tell it to do multiple in parallel)
But, indeed, just starting Claude Code (or Codex; I prefer Claude but it's a "personality thing" - try tools until you click with one) and telling it to do something is the most important step up from a chat window.
Comment by cobolexpert 3 days ago
Comment by TeMPOraL 3 days ago
Consider as an example, that "Clean Code" used to be gospel, now it's mostly considered a book of antipatterns, and many developers prefer to follow Ousterhout instead of Uncle Bob. LLMs "read" both Clean Code and A Philosophy of Software Design, but without prompting they won't know which way you prefer things, so they'll synthesize something more-less in between these two near-complete opposites, mostly depending on the language they're writing code in.
The way I think about it is: "You are a staff software engineer with 15 years of experience in <tech stack used in the project>" is doing 80% of the job, by pulling in specific regions in the latent space associated with good software engineering. But the more particular you are about style, or the more your project deviates from what's the most popular practice across any dimension (whether code style or folder naming scheme or whatnot), the more you need to describe those deviations in your prompt - otherwise you'll be fighting the model. And then, it's helpful to describe any project-specific knowledge such as which tools you're using (VCS, testing framework, etc.), where the files are located, etc. so the model doesn't have to waste tokens discovering it on its own.
Prompts are about latent space management. You need to strengthen associations you want, and suppress the ones you don't. It can get wordy at times, for the same reason explaining some complex thought to another person often takes a lot of words. First sentence may do 90% of the job, but the remaining 20 sentences are needed to narrow down on a specific idea.
Comment by cobolexpert 3 days ago
Comment by raesene9 3 days ago
You could still overload with too many skills but it helps at least.
Comment by epolanski 3 days ago
That's exactly the point. Agents have their own context.
Thus, you try to leverage them by combining ad-hoc instructions for repetitive tasks (such as reviewing code or running a test checklist) and not polluting your conversation/context.
Comment by cobolexpert 3 days ago
Comment by vidarh 3 days ago
I'd rather use more of them that are brief and specialized, than try to over-correct on having a single agent try to "remember" too many rules. Not really because the description itself will eat too much context, but because having the sub-agent work for too long will accumulate too much context and dilute your initial instructions anyway.
Comment by Macha 3 days ago
Comment by alkonaut 3 days ago
But it's good to hear that it's not me being completely dumb, it's Copilot Agent Mode tooling that is?
Comment by _zoltan_ 3 days ago
And then there's Ralph with cross LLM consensus in a loop. It's great.
Comment by tmountain 3 days ago
Comment by jonathanstrange 3 days ago
Am I right in assuming that the people who use AI agent software use them in confined environments like VMs with tight version control?
Then it makes sense but the setup is not worth the hassle for me.
Comment by ramraj07 3 days ago
You should use claude code.
Comment by JeremyNT 3 days ago
Once you see what is currently possible with this technique you will understand that programming as a field is doomed, or at the very least it's becoming something almost unrecognizable.
Comment by ctmnt 3 days ago
Comment by bojan 3 days ago
Comment by dude250711 3 days ago
Comment by 63stack 3 days ago
Comment by chrz 3 days ago
Comment by breppp 3 days ago
Let me select lines in my code which you are allowed to edit in this prompt and nothing else, for these "add a function that does x" without starting to run amok
Comment by alkonaut 3 days ago
Now it's "please add one unit test for Foobar()" and it goes away and thinks for 2 minues and does nothing then I point it to where the FooBar() which it didn't find and then adds a test method then I change the name to one I like better but now the AI change wasn't "accepted"(?) so the thing is borked...
I think the UX for agents is important and ...this can't be it.
Comment by rustyhancock 3 days ago
A high level task is given and outpops a working solution.
A) If you can't program and you're just happy to have something working you're safe.
B) If you're an experienced programmer and can specify the structure of the solution you're safe.
In between, is where it seems people will struggle. How do you get from A to B.
Comment by wiseowise 3 days ago
Comment by franze 3 days ago
So my 2 cents. Use Claude Code. In Yolo mode. Use it. Learn with it.
Whenever I post something like this I get a lot of downvots. But well ... end of 2026 we will not use computer the way we use them now. Claude Code Feb 2025 was the first step, now Jan 2026 CoWork (Claude Code for everyone else) is here. It is just a much much more powerful way to use computers.
Comment by vidarh 3 days ago
I think it will take much longer than that for most people, but I disagree with the timeline, not where we're headed.
I have a project now where the entirety of the project fall into these categories:
- A small server that is geared towards making it easy to navigate the reports the agents produce. This server is 100% written by Claude Code - I have not even looked at it, nor do I have any interest in looking at it as it's throwaway.
- Agent definitions.
- Scripts written by the agents for the agents, to automate away the parts where we (well, the agents mostly) have found a part of the task is mechanical enough to either take Claude out of the loop entirely, or produce a script that does the mechanical part interspersed with claude --print for smaller subtasks (and then systematically try to see if sonnet or haiku can handle the tasks). Eventually I may get to a point of starting to optimise it to use API's for smaller, faster models where they can handle the tasks well enough.
The goal is for an increasing proportion of the project to migrate from the second part (agent definitions) to the third part, and we do that in "production" workflows (these aren't user facing per se, but third parties do see the outputs).
That is, I started with a totally manual task I was carrying out anyway, defined agents to take over part of the process and produce intermediate reports, had it write the UI that lets me monitor the agents progress, then progressively I'd ask the agent after each step to turn any manual intervention into agents, commands, and skills, and to write tools to handle the mechanical functions we identified.
For each iteration, more stuff first went into the agent definitions, and then as I had less manual work to do, some of that time has gone into talking to the agent about which sub-tasks we can turn into scripts.
I see myself doing this more and more, and often "claude" is now the very first command I run when I start a new project whether it is code related or not.
Comment by theshrike79 2 days ago
The more you can offload to deterministic tools (script), the easier it will be to move to local LLMs when the AI bubble bursts =)
Comment by darkwater 3 days ago
Comment by TeMPOraL 3 days ago
Some 150 years ago, humanity collectively decided to try and redo everything but with electricity. In some cases, it was a clear success - e.g. lights. It enabled further progress - see e.g. computers, MRI machines, etc. In other cases, it was a failure - see e.g. cars, which still rely on ICEs despite electric cars being first, because until recently batteries just were not there. And then, in many cases the adoption was partial - see e.g. power tools, which are usually electrical, but in professional / industrial use, there's lots of hydraulic/pressurized air powered variants.
All the above took people trying things out, "throwing shit at the wall to see what sticks". We're at this stage with LLMs now.
Comment by jangxx 3 days ago
Comment by franze 3 days ago
And yes, it is a hypothesis about the future. Claude Code was just a first step. It will happen to the rest of computer use as well.
Comment by njhnjh 3 days ago
Comment by photios 3 days ago
It's a new ecosystem with its own (atrocious!) jargon that you need to learn. The good news is that it's not hard to do so. It's not as complex or revolutionary as everyone makes it look like. Everything boils down to techniques and frameworks of collecting context/prompt before handing it over to the model.
Comment by darkwater 3 days ago
Comment by theshrike79 2 days ago
Like a code review skill would have scripts that read the actual code.
Comment by darkwater 2 days ago
Comment by alkonaut 3 days ago
Comment by ryanhecht 9 hours ago
Comment by photios 3 days ago
OpenCode can use Copilot natively: https://opencode.ai/docs/providers/#github-copilot
I got Claude Code running with Copilot APIs via the LiteLLM proxy, but it was a pain in the butt. Just use OpenCode.
Comment by JeremyNT 3 days ago
The CLI tool matters. If you're not using opencode/claude you're missing out. But the latest OpenAI models are really quite good.
Comment by ryanhecht 9 hours ago
Comment by kaycey2022 3 days ago
Agentic coding has come a long way though. What you are describing sounds like a trust issue more than a skill issue. Some git scumming should fix that. Maybe what I’m going through is also a trust issue.
Comment by Bukhmanizer 4 days ago
Comment by straydusk 3 days ago
Comment by aj_g 3 days ago
Comment by iwrrtp69 3 days ago
Comment by alex_suzuki 3 days ago
Comment by a_victorp 3 days ago
Comment by sebastiennight 3 days ago
Comment by ares623 3 days ago
Comment by 0xbadcafebee 3 days ago
About 99% of the blogs [written by humans] that reach HN's front page are fundamentally incorrect. It's mostly hot takes by confident neophytes. If it's AI-written, it actually comes close to factual. The thing you don't like is usually right, the thing you like is usually wrong. And that's fine if you'd rather read fiction. Just know what you're getting yourself into.
Comment by Balinares 3 days ago
I am ceaselessly fascinated by how we can all live in the same world yet seemingly inhabit such vastly different realities.
Comment by aitchnyu 3 days ago
Comment by zuInnp 3 days ago
Comment by wiseowise 3 days ago
Comment by wesselbindt 3 days ago
Comment by ffsm8 3 days ago
The other day we were discussing a new core architecture for a Microservice we were meant to split out of a "larger" Microservice so that separate teams could maintain each part.
Instead of just discussing it entirely without any basis, I instead made a quick prototype via explicit prompts telling the LLM exactly what to create, where etc.
Finally, I asked it to go through the implementation and create a wiki page, concatting the code and outlining in 1-4 sentences above each "file" excerpt what the goal for the file is.
In the end, I went through it to double-check if it held up from my intentions - which it did and thus didn't change anything
Now we could all discuss the pros and cons of that architecture while going through it, and the intro sentence gave enough context to each code excerpt to improve understanding/reduce mental load as necessary context was added to each segment.
I would not have been able to allot that time to do all this without an LLM - especially the summarization to 1-3 sentences, so I'll have to disagree when you state this generally.
Though I definitely agree that a blog article like this isn't worth reading if the author couldn't even be arsed to write it themselves.
Comment by gbnwl 3 days ago
“Plan phase – The LLM generates a fixed sequence of tool calls before seeing any untrusted data
Execution phase – A controller runs that exact sequence. Tool outputs may shape parameters, but cannot change which tools run”
But of course the agent doesn’t plan an exact fixed sequence of tool calls and rigidly stick to it, as it’s going to respond to the outputs which can’t be known ahead of time. Anyone who’s watched Claude work has seen this literally every day.
This is just more slop making it to the top of HN because people out of the loop want to catch up on agents and bookmark any source that seems promising.
Comment by Bishonen88 3 days ago
Comment by jbstack 3 days ago
Comment by simianparrot 3 days ago
Comment by jbstack 3 days ago
Comment by simianparrot 3 days ago
Comment by wiseowise 4 days ago
Comment by 63stack 3 days ago
Comment by bandrami 3 days ago
Comment by throwaway_0236 3 days ago
One of the better ones were "Unified LLM Interaction Model (ULIM)". You read it here first...
Comment by mellosouls 3 days ago
Scaled GitHub stars to 20,000+
Built engaged communities across platforms (2.8K X, 5.4K LinkedIn, 700+ YouTube)
etc, etc.
No doubt impressive to marketing types but maybe a pinch of salt required for using AI Agents in production.
Comment by embedding-shape 3 days ago
Comment by somebehemoth 3 days ago
Comment by ozim 3 days ago
But that's a dead give away he is just scaling GitHub stars not doing actual research.
Comment by N_Lens 4 days ago
The pipe dream of agents handling Github Issue -> PullRequest -> Resolve Issue becomes a nightmare of fixing downstream regressions or other chaos unleashed by agents given too much privilege. I think people optimistic on agents are either naive or hype merchants grifting/shilling.
I can understand the grinning panic of the hype merchants because we've collectively shovelled so much capital into AI with very little to show for it so far. Not to say that AI is useless, far from it, but there's far more over-optimism than realistic assessment of the actual accuracy and capabilities.
Comment by nulone 3 days ago
Comment by aaronrobinson 4 days ago
Comment by a_victorp 3 days ago
Comment by njhnjh 3 days ago
Comment by embedding-shape 3 days ago
Already a "no", the bottleneck is "drowning under your own slop". Ever noticed how fast agents seems to be able to do their work in the beginning of the project, but the larger it grows, it seems to get slower at doing good changes that doesn't break other things?
This is because you're missing the "engineering" part of software engineering, where someone has to think about the domain, design, tradeoffs and how something will be used, which requires good judgement and good wisdom regarding what is a suitable and good design considering what you want to do.
Lately (last year or so), more client jobs of mine have basically been "Hey, so we have this project that someone made with LLMs, they basically don't know how it works, but now we have a ton of users, could you redo it properly?", and in all cases, the applications have been built with zero engineering and with zero (human) regards to design and architecture.
I have no yet have any clients come to me and say "Hey, our current vibe-coders are all busy and don't have time, help us with X", it's always "We've built hairball X, rescue us please?", and that to me makes it pretty obvious what the biggest bottleneck with this sort of coding is.
Moving slower is usually faster long-term granted you think about the design, but obviously slower short-term, which makes it kind of counter-intuitive.
Comment by catlifeonmars 3 days ago
Like an old mentor of mine used to say:
“Slow is smooth; smooth is fast”
Comment by ajjahs 3 days ago
Comment by comboy 3 days ago
So with the top performers I think what's most effective is just stating clearly what the end result you want to be (with maybe some hints for verification of results which is just clarifying the intent more)
Comment by _pdp_ 3 days ago
Comment by at__ 3 days ago
Comment by kstenerud 3 days ago
In one week, I fine-tuned https://github.com/kstenerud/bonjson/ for maximum decoding efficiency and:
* Had Claude do a go version (https://github.com/kstenerud/go-bonjson), which outperforms the JSON codec.
* Had Claude do a Rust version (https://github.com/kstenerud/rs-bonjson), which outperforms the JSON codec.
* Had Claude do a Swift version (https://github.com/kstenerud/swift-bonjson), which outperforms the JSON codec (although this one took some time due to the Codable, Encoder, Decoder interfaces).
* Have Claude doing a Python version with Rust underpinnings (making this fast is proving challenging)
* Have Claude doing a Jackson version (in progress, seems to be not too bad)
In ONE week.
This would have taken me a year otherwise, getting the base library going, getting a test runner going for the universal tests, figuring out how good the SIMD support is and what intrinsics I can use, what's the best tooling for hot path analysis, trying various approaches, etc etc. x5.
Now all I do is give Claude a prompt, a spec, and some hand-holding for the optimization phase (admittedly, it starts off at 10x slower, so you have to watch the algorithms it uses). But it's head-and-shoulders above what I could do in the last iteration of Claude.
I can experiment super quickly: Try caching previously encountered keys and show me the performance change. 5 mins, done. Would take me a LOT longer to retool the code just for a quick test. Experiments are dirt cheap now.
The biggest bottleneck right now is that I keep hitting my token limits 1-2 hours before each reset ;-)
Comment by vivzkestrel 3 days ago
Comment by MrOrelliOReilly 4 days ago
Comment by Kerrick 3 days ago
[1]: https://kerrick.blog/articles/2025/use-ai-to-stand-in-for-a-...
Comment by MrOrelliOReilly 3 days ago
Comment by bluehat974 3 days ago
Comment by 63stack 3 days ago
Bullet point lists! Cool infographics! Foreign words in headings! 93 pages of problem statement -> solution! More bullet points as tradeoffs breakdown! UPDATED! NEW!
Comment by epolanski 3 days ago
Comment by wiseowise 3 days ago
How you know something is done either by a grifter or a starving student looking for work.
Comment by 0xbadcafebee 3 days ago
- Generate a stable sequence of steps (a plan), then carry it out. Prevents malicious or unintended tool actions from altering the strategy mid-execution and improves reliability on complex tasks.
- Provide a clear goal and toolset. Let the agent determine the orchestration. Increases flexibility and scalability of autonomous workflows.
- Have the agent generate, self-critique, and refine results until a quality threshold is met.
- Provide mechanisms to interrupt and redirect the agent’s process before wasted effort or errors escalate. Effective systems blend agent autonomy with human oversight. Agents should signal confidence and make reasoning visible; humans should intervene or hand off control fluidly.
If you've ever heard of "continuous improvement", now is the time to learn how that works, and hook that into your AI agents.Comment by galaxyLogic 3 days ago
I mean why do I need to read from HN what to do, if AI is so knowledgable and even agentic?
Comment by 0xbadcafebee 3 days ago
Comment by vemv 3 days ago
I've flagged it, that's what we should be doing with AI content.
Comment by laborcontract 3 days ago
But scrap that, it's better just thinking about agent patterns from scratch. It's a green field and, unless you consider yourself profoundly uncreative, the process of thinking through agent coordination is going to yield much greater benefit than eating ideas about patterns through a tube.
0: https://arxiv.org/search/?query=agent+architecture&searchtyp...
Comment by dist-epoch 3 days ago
It literally gets "stuck" and becomes un-scrollable.
Comment by drdrek 3 days ago
Comment by verdverm 4 days ago
thanks for the share!