Apache Burr: Build reliable AI agents and applications
Posted by anhldbk 6 days ago
Comments
Comment by brotchie 6 days ago
BUT, if you boil it down, an agent really is context building, making an LLM call, executing requested tool calls, parsing the final model output, returning it to some frontend. There's extensions like memory, async tool calls, etc, but not THAT complicated from a traditional software engineering perspective.
Everyone seems to want to build their agent framework. But if you're tasked with building an agent, I've found it much easier and more maintainable to just build 1:1 code for THAT agent: most of the abstractions you get from an agent framework purely get in the way and obfuscate core agent logic.
You end up being forced to use the abstractions chosen by the agent framework, which sometimes are a mismatch for what you're actually trying to do.
Comment by peterbell_nyc 6 days ago
Don't need to get it all from one vendor, but that feels to me like the toolkit and for most use cases I'd argue: - Don't limit yourself to a single model provider (anthropic, openai, etc) - Own your context - Own your compounding
Comment by sdesol 6 days ago
> Context management so the right agents have the right context for the right sessions at the right time
I'm going to do a show HN tomorrow that explains how you can give your agents years of experience. The basic idea is, you would commit in your repo or download manifests (JSON files) that can be converted to "Brains" (SQLite databases). Each brain can have its own properties.
For example, I provide a "code intent" analyzer (instructions for AI) that says when analyzing a file, extract this metadata. For the code intent analyzer, I have the AI extract a single sentence purpose for the file. So if you execute:
gsc rg cache --db code-intent --fields purpose
you get all matches for 'cache' plus the matching file's purpose like "Modify file to update caching strategy". This is how the agent can tell if the file is talking about cache vs. whether this file is what you should change if you want to update the caching strategy.
So for what you described, you can have a brain for different stages of a task. It can be as simple as, in the planning stage, make sure you do this if you need to touch this file.
I am working on a rust-blast-radius brain that uses `syn` + AI generated metadata to help you understand "what if I changed this file, what would be affected". With the rust-blast-radius brain, the AI can summarize the types of files that will be affected without having to open the file based on what has been changed or discussed.
So you can have a rule like, if I make changes to a Rust file, make sure to do a blast radius analysis so we don't forget to consider something.
Does this align with what you are looking for?
Comment by andai 6 days ago
Another thing I've been thinking is how, most parts of a file are not relevant to the whole system.
Like there are parts where they intersect, and those seem to be the most important ones for capturing the big picture. You wanna be able to see the entire "skeleton".
So I thought the summary maybe shouldn't be English but it should be a subset of the code — the subset that's relevant to the rest of the program.
`grep import` gets you 90% of the way there.
Comment by sdesol 6 days ago
https://github.com/gitsense/chat/blob/main/base-state/analyz...
In your chat with AI, include the above file and let it know what your requirements are and I can create the analyzer and include it.
You can also think of my tool as data prepping tool. So if you have a clear prompt the AI can review the file during analysis and remove all unnecessary code so the extracted metadata will the stripped text which you can use search against.
Comment by andai 6 days ago
I think the best way to generate these is with a sub-agent. Tell it to try and solve a problem that involves editing this file, and see what it starts grepping for.
This ties in with this idea that the tools and designs should be what comes naturally to the LLM, i.e. what it's already been trained on. And the most straightforward way to do that is to let it reach for it.
Like when you reach in the darkness for an object. Where your hand lands is exactly where it should be.
Comment by sdesol 6 days ago
I need to modify OpenAI's Codex agent to support slash commands that can help humans better guide agents, and I needed a solution with the least impact. They don't accept contributions so I need to plan for syncing with the upstream.
Comment by andai 6 days ago
Are you running it with official or custom models? I've been trying to get custom models working in Codex and haven't been able to figure it out. (A lot of providers support Responses API, but they don't actually work with Codex.)
Comment by sdesol 5 days ago
I created a new brain that helped me find the answer for what you described:
> Codex does not just need a /v1/responses endpoint. It needs an OpenAI Responses-compatible agent surface. Many providers implement enough Responses API for text streaming, but not enough for Codex’s tool-call loop and event mapping.
I can understand why they might have done this for performance and/or lock-in and/or AI thinking reasons.
I don't think I will create a translation layer, as that would be a sync nightmare, so based on what I found and what you said, it doesn't look like you can use other providers unless you introduce a proxy layer to translate things.
I should also note, even if you have the translation layer, you might end up breaking harness capabilities.
I am going to update
https://github.com/gitsense/smart-codex
to include the `codex-rust-navigation` brain that you can use to chat with AI about. And you will probably want to use it since `gpt-5.5` estimated that 25 - 50 files did not have to be read:
> Roughly 300-500 files avoided, with a defensible lower bound around 25-50 files.
This brain is designed specifically for rust files so you will need to use code-intent if you want to ask more documention/config questions.
Comment by internet101010 6 days ago
Comment by kristjansson 6 days ago
Comment by cpard 6 days ago
Comment by pjmlp 6 days ago
Comment by throw1234567891 6 days ago
Comment by tomrod 6 days ago
Comment by peterbell_nyc 6 days ago
Comment by chuckadams 6 days ago
Comment by krawczstef 6 days ago
Comment by lukebuehler 6 days ago
I think that's why agent SDKs feel like the wrong abstraction. If you are writing a workflow, use a workflow engine (Airflow, Temporal, etc), and call some LLMs with a small LLM library. If you need a "real" agent, use a full-featured agent harness, like Codex or CC or Pi or whatever, then load it up with all the tools, skills, mcps that it needs, and let it rip.
Incidentally I've been building a full featured agent harness that runs inside durable workflow engines [0], but it is designed _not_ as an SDK but rather as a standalone, full-featured harness with an API.
Comment by dd8601fn 6 days ago
But sometimes people just need something to do, or something fun to play with, and “the next guy” rarely matters that much… so who cares that you’ve saddled them with the result of your paid playtime?
Comment by andai 6 days ago
And that was before I could just ask the computer to make it for me!
But most people seem to be the other way around. They'd rather deal with abstractions and boilerplate instead of writing the actual code.
Comment by mbreese 6 days ago
Where I'm starting to question this is maintainability. When I come up with a new technique or way of doing something in my new agent, how can I update an older agent. Do I want to update the older agent?
But, I get what you're talking about w.r.t. building for the exact problem at hand. For example, I'm guessing that Apache Burr has support for a plugin-able vector RAG system (or at least it will if it doesn't now). That's great, but I want my RAG system to add documents to the context and keep them as part of an updated system prompt with some very specific tweaks that happen as part of that process. This is a bespoke way of working with an existing concept (RAG) that doesn't lend itself to using any specific framework.
In my use-case, bespoke is the way to go. But then I'm still stuck with having to make engineering choices for updating older agents. So, I see your point.
Comment by cyanydeez 6 days ago
Obviously, you could have a different LLM like a "angel" that prunes a primary agent of the context it doesn't need, but I think the realistic KV cache problem is will determine the optimal structure: you want the work do be done in the most efficience KV cache (context-reuse) as much as possible.
There's definitely more to it than just spawning agents.
Comment by freakynit 6 days ago
Comment by hilariously 6 days ago
4 months of mostly spinning their wheels later they launched a really lackluster OC product that's effectively DOA.
Comment by tcdent 6 days ago
When building an agentic workflow there are enough primitives that rewriting them from scratch every time makes zero sense.
What is a tool? How does the LLM understand the tool? Formatting a native function into a serializable input/output pattern makes sense to generalize and that does not need to exist repeated in everyones application code.
We use libraries to interact with the APIs themselves; nobody would say writing a spec-compliant API client was poor practice. Agentic harnesses are just one layer above: I need to call the API and I need to do it with certain expected conventions.
Comment by hilariously 6 days ago
One, obviously yes OC contains a lot more than a harness, but my point was that it was too much for their use case and constrained their choices, not enabled them, and that choosing the right layer of abstraction is important.
There's good indirection/abstraction and there's ones that do not serve your use case, eg what was obviously day one regarding Langchain.
Comment by pianopatrick 6 days ago
And just like when people were trying to figure out which sorting algorithm made the most sense, we are all just trying to figure out which prompt algorithms with which models lead to good results.
Comment by chuckadams 6 days ago
Comment by vanuatu 6 days ago
the hard part about building agents isnt the framework it's discovery, context, traditional engineering, handling the last mile
there are some invariants like the loop, tools, observability, guardrails, monitors etc...
Comment by brotchie 6 days ago
The better pitch would be, "this is how easy observability, guardrails, monitoring, deployment, evals, versioning, A/B testing are with our framework." What the agent code looks like is somewhat incidental.
Comment by peterbell_nyc 6 days ago
Anyone have something they genuinely like for all of this? For now I'm rolling my own, but I can't believe I won't find a better OSS alternative soon...
Comment by agentdev001 6 days ago
Observability is, for my purposes, solved by a given framework supporting OpenTelemetry.
Guardrails is where I've gotten the most value of openshell being a neat package. Agent workload scope is written as policy in openshell, and capability is backed by openshell handling all execution.
Monitoring/deployment/versioning is helped as well, depending on how agents/runners are slotted into the system. Deployment namely is quite well supported- openshell has kube/helm bits that are experimental atm, but seem like a logical approach imho.
Evals and a/b testing isnt something ive explored in depth, considering that agents with composable tool sets + frontier models are beyond my expectations already.
Comment by brotchie 6 days ago
Comment by msradam 18 hours ago
Comment by elijahbenizzy 6 days ago
Comment by fxwin 6 days ago
Comment by freakynit 6 days ago
Comment by trollbridge 6 days ago
Comment by peterbell_nyc 6 days ago
Comment by elijahbenizzy 6 days ago
Comment by toddmorey 6 days ago
Comment by bko 6 days ago
Then you have a general workflow that has a set of skills (prompts) and tools. And that could be recursive.
So if you do something like "rename this file" you have to build up a workflow like:
[classifier]
what's the workflow -> rename
[rename workflow]
list files (tool call)
figure out relevant predicate (LLM)
convert predicate into a filter query give the context of the files (LLM)
figure out what you want the new name to be (LLM)
create the request body and hit the tool
approval workflow
formatting
It's a lot to manage and orchestrate and that's just one simple example. You'd like want to use the same building blocks to delete a file or move it. Even to know the right concepts is difficult as we're a bit deluded on whats going on in the background of these modern AI apps like Claude and GPT that do a lot of this stuff for you
Comment by krawczstef 6 days ago
Burr just helps you, the engineer, to really control the primitives. Then adds some cool features you don't have to think about -- like observability :)
Comment by agentifysh 6 days ago
you dont need a framework
Comment by aplomb1026 6 days ago
Comment by eranation 6 days ago
It hits every AI generated landing page trope possible.
Or was it done ironically?
Comment by glenngillen 6 days ago
I do suspect it was built with some form of AI though because the handful of links I've tried to dig into have all linked to the wrong place/are invalid :/
Comment by n2h4 6 days ago
Comment by hmokiguess 6 days ago
Comment by thedougd 6 days ago
I’ve been playing with this stack and left wondering if Strands provides any secret sauce with Agent Core. So far it doesn’t feel that way and sometimes they even feel at odds with each other.
Comment by fnordpiglet 6 days ago
I’ll need to dig into burr - I’m not finding strands has an insurmountable maturity to it but it’s not carrying a lot of weird opinion (like some of the other OSS frameworks seem to be highly infected with) and is pretty practical in what it exposed and does, and is most like the agentic frameworks I’ve worked with inside FAANG. If burr can meet that and keep growing I’d probably look at moving to it as I also get a sense strands has a bit of the Amazon “highly probable to be abandoned once the managers and PMs get promoted” feeling.
Comment by elijahbenizzy 5 days ago
Comment by msradam 6 days ago
I am currently working on skills-to-state-machine conversions, since a lot of popular skills out there are already written as phases for an AI model to follow, so it would be great to leverage the explicit functionality of Burr to make that more reliable. Thank you for this amazing project.
Comment by tcdent 6 days ago
Yes, Python has decorators, but they're best used as "filters" that apply to functions or methods. Cache this, serialize the output of this function always, prepare this function to be used as a tool by an agentic harness. Not registration, not flow control. You may disagree but someone has to say it; FastAPI influenced the modern use of decorators far too much in the wrong direction.
Builder patterns are a Rust convention, because Rust has no named keyword arguments. A Python function already exposes a named contract. There is very little reason to ever to sequentially pass configuration parameters in chained method calls. If you need to add state that doesn't exist yet to a constructor or factory, that is not a builder pattern. That is registration. The one place where builder patterns should be tolerated is query builders. They iteratively build on a concept and having the additional "slot" for metadata (method name plus keyword arguments) is genuinely useful. Using methods which accept single parameter instead of keyword arguments is incorrect.
Comment by elijahbenizzy 6 days ago
Comment by mkarrmann 6 days ago
Comment by giancarlostoro 6 days ago
Comment by tcdent 6 days ago
Comment by giancarlostoro 6 days ago
Comment by anentropic 6 days ago
Comment by Oras 6 days ago
Comment by elric 6 days ago
Comment by Oras 6 days ago
It might sounded that I’m against the move, but I’m just curious as what apache found in the platform to get incubated
Comment by krawczstef 6 days ago
Comment by pratio 6 days ago
Comment by elijahbenizzy 6 days ago
Comment by bananamogul 6 days ago
[1] https://en.wikipedia.org/wiki/Burr%E2%80%93Hamilton_duel
Comment by elijahbenizzy 6 days ago
Comment by otterley 6 days ago
Comment by mzaccari 6 days ago
Comment by abirch 6 days ago
Comment by elijahbenizzy 6 days ago
Comment by suction 6 days ago
Comment by lnenad 6 days ago
Comment by doublerabbit 6 days ago
Comment by nico 6 days ago
Ideally self-hostable/open source
I know claude code has a lot of that internally built in already, but it’s claude-only
Comment by sgc 6 days ago
Comment by nico 6 days ago
Comment by sgc 4 days ago
Comment by sabzil37 4 days ago
Comment by flakiness 6 days ago
Comment by doublerabbit 6 days ago
Comment by fantasizr 6 days ago
Comment by elijahbenizzy 6 days ago
Comment by mooreds 6 days ago
I searched the docs for authentication and mcp (one of the protocols which, among other things, handles some pieces of authentication/authorization) but didn't see any results.
What did I miss?
Comment by yaodub 6 days ago
Comment by redlewel 6 days ago
Also "700+ Discord Members" is not any type of endorsement of a technology or service.
Comment by ivanmontillam 6 days ago
Might be my IRC mind talking, but a proprietary service like Discord, or Slack, or even Telegram at times, is just not suitable for the target community, as there's often a data privacy concern.
I never had a Discord account, and I'm hoping I can keep that streak.
Comment by werdnapk 6 days ago
Comment by amne 6 days ago
Comment by coneonthefloor 6 days ago
Comment by suction 6 days ago
Comment by coneonthefloor 6 days ago
If that’s all the effort you are going through to make your brochure site, I can only assume the same care was given to the actual product.
Comment by vanuatu 6 days ago
reddit user testimonial
framework is for state machines
why man..
Comment by ivanmontillam 6 days ago
Comment by pixel_popping 6 days ago
Comment by pramodbiligiri 6 days ago
Comment by enragedcacti 6 days ago
so far I'm seeing: GradientText, Animated button, EyebrowPill, Aurora background, MockIDE, LogoRow, SlippyWords, StatCounter, CommunityBadge
also: "No DSL, no YAML — just Python functions and decorators."
'It's not X, its Y' but with an added em dash is crazy work.
Comment by doublerabbit 6 days ago
Comment by krawczstef 6 days ago
Comment by chill_ai_guy 6 days ago
Comment by elijahbenizzy 6 days ago
Comment by _pdp_ 6 days ago
Reliable means "can it finish the job that was tasked to do". It certainly has nothing to do with state machines.
Comment by drchaim 6 days ago
Comment by big-chungus4 6 days ago
Comment by hbarka 6 days ago
Comment by shhhhhplease 6 days ago
Comment by krawczstef 6 days ago
Comment by schainks 5 days ago
Comment by g42gregory 6 days ago
Comment by vivekchand19 6 days ago
Comment by CuriouslyC 6 days ago
Comment by hedgehog 6 days ago
Comment by TZubiri 6 days ago
Comment by iririririr 6 days ago
Comment by Meneth 6 days ago
Comment by cohix 6 days ago
Comment by sermakarevich 6 days ago
I found the Spec Driven Development approach works nicely for me, and I've been using it since Feb 2026 for all my mid+ size projects. I started with the GSD plugin, but it soon became too heavy, so I implemented my own lightweight SDD-based workflow for Claude. A friend of mine ported it to gemini-cli, and that version was added to Google's approved third-party frameworks for internal usage. The idea is to decompose feature implementation into multiple steps, task implementation into multiple subtasks, and be able to clear the context after every task/implementation.
Repo: https://github.com/sermakarevich/sddw
Slides: https://docs.google.com/presentation/d/1SjKXF7hkoqyiN9-3tBGY...
When SDD was not enough, I started playing with scaling a single AI worker to multiple, and then got to an agent swarm. Built on top of a centralized Beads database, claude -p headless execution, a UI, a custom ask_user MCP, and Telegram integrations, fleet (the app name) lets me add many tasks in advance, control the number of workers executing them, and use any kind of coder/model. It works nicely with the SDDW implementation phase. It shines when you keep creating tasks, define dependencies between them, and give clear descriptions. For personal projects I can queue up 70 tasks for an overnight run, set the number of workers to 1 to not be blocked by usage limits, and let it roll.
Repo: https://github.com/sermakarevich/fleet
Slides: https://docs.google.com/presentation/d/1O_pXyKdtpRG2ORD1xw7s...
Since Fable 5 appeared, I've been changing the way I work with fleet. Instead of adding tasks/descriptions/dependencies to fleet myself, I talk to Fable: specify the goal, ensure understanding, and let Fable 5 add tasks to fleet. Fable is expensive, but in this setup it doesn't code — it just investigates, designs, decomposes, and creates tasks. Workers use the cheaper Sonnet 4.6 model.
Reliability comes with task implementation decomposition into multiple steps, feature decomposition into many smaller and simpler subtasks, having better description, clean and focused context.
Comment by helezon77 6 days ago
Comment by xuanlin314 6 days ago
Comment by agentrank_ai 6 days ago
Comment by offercc 6 days ago
Comment by throwaway613746 6 days ago