Gas Town's agent patterns, design bottlenecks, and vibecoding at scale
Posted by pavel_lishin 1 day ago
Comments
Comment by mediaman 23 hours ago
It pushes and crosses boundaries, it is a mixture of technology and art, it is provocative. It takes stochastic neural nets and mashes them together in bizarre ways to see if anything coherent comes out the other end.
And the reaction is a bunch of Very Serious Engineers who cross their arms and harumph at it for being Unprofessional and Not Serious and Not Ready For Production.
I often feel like our industry has lost its sense of whimsy and experimentation from the early days, when people tried weird things to see what would work and what wouldn't.
Maybe it's because we also have suits telling us we have to use neural nets everywhere for everything Or Else, and there's no sense of fun in that.
Maybe it's the natural consequence of large-scale professionalization, and stock option plans and RSUs and levels and sprints and PMs, that today's gray hoodie is just the updated gray suit of the past but with no less dryness of imagination.
Comment by hyperpape 23 hours ago
So, Steve has the big scary "YOU WILL DIE" statements in there, but he also has this:
> I went ahead and built what’s next. First I predicted it, back in March, in Revenge of the Junior Developer. I predicted someone would lash the Claude Code camels together into chariots, and that is exactly what I’ve done with Gas Town. I’ve tamed them to where you can use 20–30 at once, productively, on a sustained basis.
"What's next"? Not an experiment. A prediction about how we'll work. The word "productively"? "Productively" is not just "a big fun experiment." "Productively" is what you say when you've got something people should use.
Even when he's giving the warnings, he says things like "If you have any doubt whatsoever, then you can’t use it" implying that it's ready for the right sort of person to use, or "Working effectively in Gas Town involves committing to vibe coding.", implying that working effectively with it is possible.
Every day, I go on Hacker News, and see the responses to a post where someone has an inconsistent message in their blog post like this.
If you say two different and contradictory things, and do not very explicitly resolve them, and say which one is the final answer, you will get blamed for both things you said, and you will not be entitled to complain about it, because you did it to yourself.
Comment by an0malous 22 hours ago
Comment by bloppe 13 minutes ago
Comment by dada216 3 hours ago
Comment by cthalupa 13 hours ago
I don't spend much time on LinkedIn, but basically every comment I've read on HN is that, at best, Gas Town can pump out a huge amount of "working" code in short timeframes at obscene costs.
The overwhelming majority are saying "This is neat, and this might be the rough shape of what comes next in agentic coding, but it's almost certainly not going to be Gas Town itself."
I have seen basically no one say that Gas Town is the The Thing.
Comment by jmspring 16 hours ago
Is Gas Town the implementation? I'm not sure.
What is interesting is seeing how this paradigm can help improve one's workflow. There is still a lot of guidance and structuring of prompts / claude.md / whichever files that need to be carefully written.
If there is a push for the equivalent of helm charts and crds for gas town, then I will be concerned.
Comment by storystarling 1 hour ago
Comment by Treegarden 5 hours ago
Comment by pxtail 2 hours ago
Comment by pstuart 20 hours ago
Gastown looks like a viable avenue for some app development. One of the most interesting things I've noticed about AI development is that it forces one to articulate desired and prohibited behaviors -- a spec becomes a true driving force.
Yegge's posts are always hyperbolic and he consistently presents interesting takes on the industry so I'm willing to cut him a buttload of slack.
Comment by dingnuts 16 hours ago
Comment by spacecadet 5 hours ago
Welcome to being a member of a product team who cares beyond just whats on their screen... Honestly there is a humbling moment coming for everyone, it and Im not sure its unemployment.
Comment by lowbloodsugar 20 hours ago
Comment by meowface 21 hours ago
I think ideas from it will probably partially inspire future, simpler systems.
Comment by wonnage 20 hours ago
Comment by adabyron 14 hours ago
Comment by leoc 7 hours ago
Comment by lupire 19 hours ago
Comment by wahnfrieden 18 hours ago
And FOMO stories about missing out on Bitcoin when he knew about it, so he doesn't want you to miss out on this new opportunity to get "filthy rich" as an "investor" while you still can.
Comment by torginus 4 hours ago
The swings on BTC price are absolutely insane, and ETH even more so (which is even more risky, without showing higher gains).
Comment by wahnfrieden 11 hours ago
Comment by DonHopkins 2 hours ago
https://github.com/SimHacker/moollm/tree/main/skills/economy...
The official currency of MOOLLM is MOOLAH. It uses PROOF OF MILK consensus — udderly legen-dairy interga-lactic shit coin, without the bull.
Comment by meowface 6 hours ago
Comment by JasonADrury 6 hours ago
In the past this would've been different when you couldn't necessarily expect all participants to be fully aware of what's going on, but absolutely nobody is treating "gas town" coins as a serious investment.
Comment by jaapz 5 hours ago
Comment by JasonADrury 3 hours ago
Comment by jaapz 3 hours ago
Comment by JasonADrury 3 hours ago
https://apps.apple.com/app/bags-financial-messenger/id647319...
The tagline of the app? "Buy & sell memecoins". Transparently advertised as a crowdfunding mechanism using memecoins.
Comment by wahnfrieden 2 hours ago
Comment by meowface 6 hours ago
Comment by JasonADrury 3 hours ago
Comment by wahnfrieden 2 hours ago
Comment by victorbjorklund 5 hours ago
Comment by JasonADrury 3 hours ago
Sure! Are those people buying bags.fm tokens? Probably not.
This isn't even marketed as an investment https://bags.fm but a crowdfunding tool for developers with a casino attached.
You don't have to be smart to read the big text on the website.
Comment by wahnfrieden 2 hours ago
Comment by torginus 4 hours ago
donations?
Comment by cap11235 12 hours ago
Comment by meowface 4 hours ago
Comment by csallen 17 hours ago
Note the word "future" not "present". People are making a prediction of where things will go. I haven't seen a single person saying that Gas Town as it exists today is ready for production-grade engineering project.
Comment by potatolicious 19 hours ago
If I can be a bit bold and observe that this tic is also a very old rhetorical trick you see in our industry. Call it Schrodinger's Modest Proposal if you will.
In it someone writes something provocative, but casts it as both a joke and deadly serious at various points. Depending on how the audience reacts they can then double down on it being all-in-good-jest or yes-absolutely-totally. People who enjoy the author will explain the nonsensical tension as "nuance".
You see it in rationalist writing all the time. It's a tiresome rhetorical "trick" that doesn't fool anyone any more.
Comment by directevolve 19 hours ago
Comment by sdwr 17 hours ago
Comment by directevolve 13 hours ago
Comment by antonvs 6 hours ago
> "...philosopher Nicholas Shackel coined the term 'motte-and-bailey' to describe the rhetorical strategy in which a debater retreats to an uncontroversial claim when challenged on a controversial one."
-- https://heterodoxacademy.org/blog/the-motte-and-the-bailey-a...
Comment by csallen 17 hours ago
- "what's next" does not mean "production quality" and is in no way mutually exclusive with "experimental". It means exactly what it says, which is that what comes next in the evolution of LLM-based coding is orchestration of numerous agents. It does not somehow mean that his orchestrator writes production-grade code and I don't really understand why one would think it does mean that.
- "productively" also does not mean "production quality". It means getting things done, not getting things done at production-grade quality. Someone can be a productive tinkerer or they can be a productive engineer on enterprise software. Just because they have the word "product" in them does not make them the same word.
- "working effectively" is a phrase taken out of the context of this extremely clear paragraph which is saying the opposite of production-grade: "Working effectively in Gas Town involves committing to vibe coding. Work becomes fluid, an uncountable substance that you sling around freely, like slopping shiny fish into wooden barrels at the docks. Most work gets done; some work gets lost."
If he wanted to say that Gas Town wrote production grade code, he would have said that somewhere in his 8000-word post. But he did not. In fact, he said the opposite, many many many many many many times.
You're taking individual words out of context, using them to build a strawman representing a promise he never came close to making, and then attacking that strawman.
What possible motivation could you have for doing this? I have no idea.
> If you say two different and contradictory things...
He did not. Nothing in the blog post explicitly says or even remotely implies that this is production quality software. In addition, the post explicitly, unambiguously, and repeatedly screams at you that this is highly experimental, unreliable, spaghetti code, meant for writing spaghetti code.
The blog post could not possibly have been more clear.
> ...because you did it to yourself.
No, you're doing this to his words.
Don't believe me? Copy-paste his post into any LLM and ask it whether the post is contradictory or whether it's ambiguous whether this is production-grade software or not. No objective reader of this would come to the conclusion that it's ambiguous or misleading.
Comment by madhadron 12 hours ago
That's hilarious! You might want to add a bit more transition for the joke before the other points above, though.
Comment by airza 8 hours ago
Bleak
Comment by drewbug01 22 hours ago
Our industry is held back in so many ways by engineers clinging to black-and-white thinking.
Sometimes there isn’t a “final” answer, and sometimes there is no “right” answer. Sometimes two conflicting ideas can be “true” and “correct” simultaneously.
It would do us a world of good to get comfortable with that.
Comment by hyperpape 21 hours ago
The final answer can be "each of these positions has merit, and I don't know which is right." It can be "I don't understand what's going on here." It can be "I've raised some questions."
The final answer is not "the final answer that ends the discussion." Rather, it is the final statement about your current position. It can be revised in the future. It does not have to be definitive.
The problem comes when the same article says two contradictory things and does not even try to reconcile them, or try to give a careful reader an accurate picture.
And I think that the sustained argument over how to read that article shows that Yegge did a bad job of writing to make a clear point, albeit a good job of creatring hype.
Comment by habinero 22 hours ago
Comment by akst 11 hours ago
I think it's possible to convey that you believe strongly in your idea and it could be the future (or "is the future" if you're so sure of self) while it still being experimental. I think he would get less critics if he wasn't so hyperbolic in his pitch and had fewer inflammatory personal remarks about people who he hasn't managed to bring on side.
People I know who communicate like that generally struggle to contribute constructively to nuanced discussions, and tend to seek out confrontation for the sake of it.
Comment by rlt 17 hours ago
Comment by taneq 8 hours ago
I think what’s next after an experiment very often is another experiment, especially when you’re doing this kind of exploratory R&D.
Comment by GoatInGrey 22 hours ago
Comment by gozzoo 20 hours ago
Comment by square_usual 19 hours ago
Comment by joshstrange 21 hours ago
Ok, I can accept that, it's a choice.
> Things said there may not reflect his actual thoughts on the subject(s) at hand.
Nope, you don't get to have it both ways. LLMs are just tools, there is always a human behind them and that human is responsible for what they let the LLM do/say/post/etc.
We have seen the hell that comes from playing the "They said that but they don't mean it" or "It's just a joke" (re: Trump), I'm not a fan of whitewashing with LLMs.
This is not an anti or pro Gas Town comment, just a comment on giving people a pass because they used an LLM.
Comment by idle_zealot 20 hours ago
The same approach actually applies to Trump and other liars. You can't take anything they say as truth or serious intent on its own; they're not engaging in good faith. You can remove yourself one step and attempt to analyze why they say what they do, and from there get at what to take seriously and what to disregard.
In Steve's case, my interpretation is that he's extremely bullish on AI and sees his setup or something similar as the inevitable future, but he sprinkles in silly warnings to lampshade criticism. That's how the two messages of "this isn't serious" and "this is the future or software development" co-exist. The first is largely just a cover and an admission this his particular project is a mess. Note that this interpretation assumes that the contents of the blog post in question were largely written by him, even if LLM assistance was used.
Comment by joshstrange 20 hours ago
I agree with you on Steve's case, and I have no ill will towards him. Mostly it was just me trying to "stomp" on giving him a pass, but, as you point out, that may not have been what the original commenter meant.
Comment by 63stack 6 hours ago
Comment by usefulcat 22 hours ago
Comment by jauntywundrkind 22 hours ago
Comment by WesolyKubeczek 21 hours ago
Comment by square_usual 19 hours ago
Comment by ludicity 19 hours ago
I suppose that has little to do with the technical merits of the work, but it's such a bad look, and it makes everyone boosting this stuff seem exactly as dysregulated/unwise as they've appeared to many engineers for a while.
I met Sean Goedecke for lunch a few weeks ago, who uses LLMs a bunch, and is clearly a serious adult, but half the folks being shoved in front of everyone are behaving totally manic and people are cheering them on. Absolutely blows my mind to watch.
https://pivot-to-ai.com/2026/01/22/steve-yegges-gas-town-vib...
Comment by skybrian 18 hours ago
> $GAS is not equity and does not give you any ownership interest in Gas Town or my work. This post is for informational purposes only and is not a solicitation or recommendation to buy, sell, or hold any token. Crypto markets are volatile and speculative — do not risk money you can’t afford to lose.
...
> Note: The next few sections are about online gambling in all its forms, where “investing” is the buy-and-hold long-form “acceptable” form of gambling because it’s tied to world GDP growth. Cryptocurrencies are subject to wild swings and spikes, and the currency tied to Gas Town is on a wild swing up. But it’s still gambling, and this stuff is only for people who are into that… which is not me, and should probably not be you either.
In the next post he said he wasn't going to shill it any more, and then the price collapsed and people sent him death threats on Twitter. It probably would have collapsed anyway. Perhaps there was supposedly some implicit bargain that he shouldn't take the money if he wasn't going to shill? Well, there's certainly no rule saying you have to do that.
I think he's not very much to blame for taking the money from degenerate gamblers, and the cryptocurrency idiots are mostly to blame for their own mistakes.
Comment by gavin-1 15 hours ago
I empathize with the disdain for crypto idiots, but I still think the people running or promoting these scams deserve most of the blame. "There's a market for my poison" is every dopamine dealer's excuse.
Comment by cap11235 14 hours ago
Comment by cannonpr 16 hours ago
Comment by dpatterbee 17 hours ago
Comment by skybrian 16 hours ago
Comment by matkoniecz 12 hours ago
In the same way signposting and credibly warning "I murder people" does not make ok to murder people.
Comment by skybrian 4 hours ago
Comment by andrepd 6 hours ago
So drug dealers are not to blame for taking the money from degenerate addicts! Let's free everyone and disband the DEA, we'll save billions of dollars.
Oh wait nvm this line of thinking only applies to sv people
Comment by wahnfrieden 11 hours ago
Details https://pivot-to-ai.com/2026/01/22/steve-yegges-gas-town-vib...
Comment by cap11235 12 hours ago
Also, 275k lines for a markdown todo app. Anyone defending this is an idiot. I'll just say that. Go ahead, defend it. Go do a code review on `beads`. Don't say it's alright, but gastown is madness. He fucking sucks.
Comment by piker 23 hours ago
Personally I got about 3 paragraphs into what seemed like a twelve-page fevered dream and filed it under "not for me yet".
Comment by chwtutha 23 hours ago
Exactly!
Comment by pja 23 hours ago
Comment by Xmd5a 23 hours ago
Comment by saidarembrace 23 hours ago
Comment by tikhonj 23 hours ago
Comment by CuriouslyC 22 hours ago
What I don't like is people me-tooing gastown as some breakthrough in orchestration. I also don't like how people are doing the same thing for ralph.
In truth, what I hate is people dogpiling thoughtlessly on things, and only caring about what social media has told them to care about. This tendency makes me get warm tingles at the thought of the end of the world. Agent smith was right about humanity.
Comment by FuckButtons 20 hours ago
Comment by CuriouslyC 19 hours ago
Comment by aprilthird2021 18 hours ago
But that's often enough to loop over and over again and eventually finish a task
Comment by wrs 23 hours ago
Comment by square_usual 19 hours ago
Comment by lupire 18 hours ago
https://pivot-to-ai.com/2026/01/22/steve-yegges-gas-town-vib...
Comment by cap11235 12 hours ago
Comment by SomaticPirate 23 hours ago
When he decided to monetize the eyeballs on the project instead of anything related to the engineering. Which, of course, Steve isn't smart enough to understand (in his own words) and he recommends you not buy but he still makes a tidy profit from it.
Its a memecoin now... that has a software project attached to it. Anything related to engineering died the day he failed to disavow the crypto BS and instead starting shrilling it.
What happened to engineers not calling out BS as BS.
Comment by ewoodrich 20 hours ago
https://steve-yegge.medium.com/bags-and-the-creator-economy-...
Comment by lovich 19 hours ago
It makes it difficult to believe that gas town is actually producing anything of value.
I also lol at his bitching about how the bank didn’t let him do the transactions instantly as he describes himself how much of a scam this seems and how the worst thing is his bank account being drained, like banks don’t have a self interest in protecting their clientele from such scams.
Comment by vanderZwan 6 hours ago
Because I actually have an arts degree and I know the equivalent of a con artist in a rich people arts gallery bullshitting their way into money when I see one.
And the "pushing and crossing boundaries" argument has been abused as a pathetic defense to hide behind shallowness in the art world for longer than anyone in this discussion board has been alive. It's not provocative when it's utterly predictable, and in this case the "art" is "take the most absurd parody of AI culture and play it straight". Gee whiz how "creative" and "provocative".
Comment by tracerbulletx 23 hours ago
Comment by JamesTRexx 22 hours ago
The first thing I thought as I read his post and saw the images of the weasels was that he should make a game of it. Maybe name it Bitborn.
Comment by q3k 15 hours ago
Comment by wahnfrieden 11 hours ago
Comment by sailfast 16 hours ago
Comment by cap11235 11 hours ago
Comment by bdcravens 23 hours ago
Fear over what it means if it works.
Comment by mrkeen 22 hours ago
A couple of days ago I was sitting in a meeting of 10-15 devs, discussing our AI agents. People were raising issues and brainstorming ways around the problems with AI. How to make the AI better.
Our devs were occupied doing AI things, not accounting/banking things.
If the time savings were as promised, we should have been 3 devs (with the remaining devs replaced by 7-10 AI agents) discussing accounting/banking.
If Gas Town succeeds, it will just be the next toy we play with instead of doing our jobs.
Comment by square_usual 19 hours ago
This is only partly tongue in cheek :P
Comment by turtlebits 19 hours ago
Its like the ultimate RTS, plus you get paid.
Comment by stronglikedan 19 hours ago
Comment by bdcravens 10 hours ago
Comment by xyzsparetimexyz 16 hours ago
Comment by cap11235 14 hours ago
Comment by cap11235 11 hours ago
Comment by freedomben 19 hours ago
[Medium post]: https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
[HN Discussion]: https://news.ycombinator.com/item?id=46458936
Comment by matkoniecz 12 hours ago
Comment by pydry 23 hours ago
Remember the days when people experimented with and talked about things that werent LLMs?
I used to go to a lot of industry events and I really enjoyed hearing about the diversity of different things people worked on both as a hobby and at work.
Now it's all LLMs all the time and it's so goddamn tedious.
Comment by Ronsenshi 23 hours ago
I go to tech meetups regularly. The speed at which any conversation end up on the topic of AI is extremely grating to me. No more discussions about interesting problems and creative solutions that people come up with. It's all just AI, agentic, vibe code.
At what point are we going to see the loss of practical skills if people keep on relying on LLMs for all their thinking?
Comment by magicalist 22 hours ago
And then you give in and ask what they're building with AI, that activation energy finally available to build the side project they wouldn't have built otherwise.
"Oh, I'm building a custom agentic harness!"
...
Comment by Analemma_ 22 hours ago
I can't even say that's definitely a losing bet-- it could very well happen-- but boy does it seem risky to go all-in on it.
Comment by FridgeSeal 19 hours ago
On the other, if a large portion of the industry goes all in, and it _doesn’t_ pay off and craters them, maybe the overhyping will move onto something else and we can go back to having an interesting, actually-nice-to-be-in-industry!
Comment by zdragnar 6 hours ago
He framed it as software developers were once the experts in the room, but so many young people joined the industry that managers turned to micromanaging them out of instinctual distrust. The manifesto was supposed to be the way for software developers to retake the mantle of the professional expert, trusted to make things happen.
I don't really think that happened, especially with agile becoming synonymous with Scrum, but if this doesn't pay off and craters the industry, it seems like it'd be the final nail in that coffin.
Comment by FeteCommuniste 21 hours ago
Comment by TeMPOraL 6 hours ago
(And similar to the two, I expect many of the initial ideas for LLM application to be bad, perhaps obviously stupid in hindsight. But enough of them will work to make LLMs become a lasting thing in every aspect of people's lives - again, just like electricity and the Internet did).
Comment by pydry 5 hours ago
~80% of the usage patterns i see these days falsely assume that LLMs can handle their own quality control and are optimizing for appearance, potential or demo-worthiness rather than hardcore usefulness. Gas town is not an outlier here.
When the internet and electricity were ~3 years old people were already using it for stuff that was working and obviously world changing rather than as demos of potential.
That 20% of usage patterns that work now arent going away but the other 80% are going to be seen as blockchainesque hype in 5 or 10 years.
Comment by itsafarqueue 21 hours ago
And maybe he’s even right. But the reaction is to the flavour of chip on the shoulder delivery mixed into an otherwise fun piece.
Comment by cap11235 14 hours ago
Comment by rulelet 6 hours ago
So even if the super serious engineers are serious, they should watch their back. Eventually enough guardrails will be created or even the ask will change enough for a lot of automation to happen. And make no mistake, it is automation no different than having automated testing replace armies of manual testing or code generation or procedural generation or any other machine method. And who is going to be left with jobs? People who embrace the change, not people who lament for the good old days or who can't adapt.
Sucks but just how the world works. Sit on the bleeding edge or be burned. Yes there is an "enough" but I suspect enough is around people willing to look at Gastown or even make their own Gastown, not the other side.
Comment by DonHopkins 5 hours ago
Comment by DonHopkins 6 hours ago
I've been reading Yegge since the "Stevey's Drunken Blog Rants™" days -- his rantings on Lisp, Emacs, and the Eval Empire shaped how I approach programming. His pro-LLM-coding rants were direct inspiration for my own work on MOOLLM. The guy has my deep respect, and I'm intrigued by his recent work on Sourcegraph and Gas Town.
Gas Town and MOOLLM are siblings from that same Eval Empire -- both oriented along the Axis of Eval, both transgressively treating LLMs as universal interpreters. MOOLLM immanentizes Eval Incarnate -- https://github.com/SimHacker/moollm/blob/main/designs/eval/E... -- where skills are programs, the LLM is eval(), and play is but the first step of the "Play Learn Lift" methodology: https://github.com/SimHacker/moollm/tree/main/skills/play-le....
The difference is resource constraints. Yegge has token abundance; I'm paying out of pocket. So where Gas Town explores "what if tokens were free?" (20-30 Claude instances overnight), MOOLLM explores "what if every token mattered?" Many agents, many turns, one LLM call.
To address wordswords2's concern about "no metrics or statistics" -- I agree that's a gap in Gas Town. MOOLLM makes falsifiable claims with receipts. Last night I ran an Amsterdam Fluxx Marathon stress test: 116+ turns, 4 characters (120+ character-turns per LLM call), complex social dynamics on top of dynamic rule-changing game mechanics. Rubric-scored 94/100. The run files exist. Anyone can audit.
qcnguy's critique ("same thing multiplied by ten thousand") is exactly the kind of specific feedback that helps systems improve. I wrote a detailed analysis comparing the two approaches -- intellectual lineage (Self, Minsky's K-lines, The Sims, LambdaMOO), the "vibecoded" problem (MOOLLM is LLM-generated but rigorously iterated, not ship-and-hope), and why "carrier pigeon" IPC architecture is a dark pattern when LLMs can simulate many agents at the speed of light.
an0malous raises a real fear about bosses thinking "throw agents at it" replaces engineering. Both systems agree: design becomes the bottleneck. Gas Town says "keep the engine fed with more plans." MOOLLM says "design IS the point -- make it richer." Different answers, same problem.
lowbloodsugar mentions building a "proper, robust, engineering version" -- I'd love to compare notes. csallen is right that "future" doesn't mean "production-grade today."
Analysis: https://github.com/SimHacker/moollm/blob/main/designs/GASTOW...
MOOLLM repo: https://github.com/SimHacker/moollm
Happy to discuss tradeoffs or hear where my claims don't hold up. Falsifiable criticism welcome -- that's how systems improve.
Comment by DonHopkins 5 hours ago
I ran a 260KB session log where I convened a simulated symposium of computing pioneers to design an Adventure Compiler — a tool that compiles YAML adventure definitions that run on MOOLLM under cursor into standalone deterministic browser games requiring no LLM at runtime.
The twist: the "attendees" include AI-simulated tributes to Will Wright, Alan Kay, Marvin Minsky, Seymour Papert, Ted Nelson, Ken Kahn, Gary Drescher, and 25+ others — both living legends and memorial candles for those who've passed. All clearly marked as simulated tributes, not transcripts.
What emerged from this thought experiment:
- Pie menus as the universal interface (rooms, inventory, dialogue trees)
- Sims-style needs system with YAML Jazz inner voice ("hunger: 1 # FOOD. FOOD. FOOD.")
- Prototype-based objects (Self/JavaScript delegation chains)
- Schema mechanism + LLM = "teaching them to fly"
- Git as the collaboration operating system
- ToonTalk-inspired "programming by petting" for terpene kittens
- Speed of Light simulation — the opposite of "carrier pigeon" multi-agent architectures
On that last point: most multi-agent systems use message passing between separate LLM calls. Agent A generates output, it gets detokenized to text, sent over IPC, retokenized into Agent B's context. MOOLLM inverts this. Everything happens in one LLM call.
The spatial MOO map (rooms connected by exits) provides navigation, but communication is instantaneous within a call. Many agents, many turns, zero latency between them — and zero token requantization or semantic noise from successive detokenization/tokenization loops.
The session includes adversarial brainstorming where Barbara Liskov challenges schema contracts, James Gosling questions performance, Amy Ko pushes accessibility, and Bret Victor demands immediate feedback. Each critique gets a concrete response.
Concrete outputs: a working linter, architecture decisions, 53 indexed topics from "Food Oriented Programming" to "Hidden Objects as Invisible Infrastructure."
This is MOOLLM's Play-Learn-Lift methodology in action — play with ideas, extract patterns, lift into reusable skills and efficient scripts.
Session log (260KB, 8000+ lines): https://github.com/SimHacker/moollm/blob/main/examples/adven...
MOOLLM repo: https://github.com/SimHacker/moollm
The session uses representation ethics guidelines — all simulated people are clearly marked, deceased figures invoked with memorial candles, and the framing is explicitly "educational thought experiment."
Happy to discuss the ethics of simulating people, the architecture decisions, or how this relates to my earlier Gas Town comparison post.
Comment by DonHopkins 4 hours ago
>Doug Engelbart (Augmentation): "Bootstrapping. The tools that build the tools. Your adventure compiler should be able to compile ITS OWN documentation into an adventure ABOUT how it works. The manual is a playable game."
That is exactly how the self documenting categorized skill directory/room works -- the directory is a room with subdirectories for every skill, themselves intertwingled rooms, which form a network you can navigate around via k-lines (see also tags).
Here is the skills dir, with the ROOM.yml file that makes it a room (like COM QueryInterface works: multiple interfaces available for a class, for multiple aspects of it, the directory is IUnknown and you can QI by looking for known interfaces like ROOM.yml, CHARACTER.yml, CONTAINER.yml that inherit from the corresponding skills).
And the README.md file is naturally the ubiquitous human readable documentation (also great for LLM deep dives). And github kindly formats and publishes README.md on every repo directory page, supporting mermaid diagrams, etc):
MOOLLM Skills dir:
https://github.com/SimHacker/moollm/tree/main/skills
MOOLLM Skills room, with skill K-Line navigation protocol:
https://github.com/SimHacker/moollm/blob/main/skills/ROOM.ym...
# ROOM.yml — The Skill Nexus
#
# This is a ROOM — a metaphysical library where all capabilities live.
# Every skill is a book that teaches itself when you read it.
# Every cluster is a shelf of related knowledge.
# Every ensemble is a team that works together.
To go meta, you can enter the Skill Skill (skills/skill), an extended MOOLLM meta-skill that knows all about creating new skills (via the constructionist "Play Learn Lift" strategy), and importing and upgrading Anthropic skills:https://github.com/SimHacker/moollm/tree/main/skills/skill
And here is a narrative session of me giving a tour of the category and skill networks by hopping around through K-Lines!
MOOLLM currently has 103 Anthropic compatible but extended skills (using 7 MOOLLM extensions, like CARD.yml, K-Lines, Self Prototypes and Delegation, etc).
Session Log: K-Line Connections Safari:
https://github.com/SimHacker/moollm/blob/main/examples/adven...
Eight luminaries have been summoned as Hero-Story familiars — not puppets, but conceptual guides whose traditions we invoke. Each carries the K-lines they pioneered. [...]
ENTERING THE SKILL NEXUS
You push through a shimmering membrane and step into the Skill Nexus.
The space is impossible — a vast spherical chamber where books float in mid-air, orbiting a central point of warm golden light. But these aren't books. They're SKILLS. Living documents that teach themselves when you read them.
Lines of golden light connect related skills. Each connection pulses with meaning. This isn't a library — it's a constellation of knowledge.
Your companions materialize beside you:
Marvin Minsky adjusts his glasses, looking around with evident satisfaction.
"Ah! K-lines made manifest. Each of these floating tomes is a knowledge structure. Touch one and it reactivates an entire constellation of associations. I wrote about this in 1985, but I never imagined seeing it rendered so... literally."
Ted Nelson is already examining the golden threads between skills.
"Two-way links! Every connection goes BOTH directions. When skill A references skill B, skill B knows about skill A. This is what I've been trying to explain since 1965! Everything is deeply intertwingled!"
James Burke turns to address an invisible camera.
"You're looking at the Skill Nexus. A room where every door leads to another room, and every room has doors to everywhere else. But here's the thing — the signs above each door tell you WHY. Not just where you're going, but what connects HERE to THERE. That's what we're going to explore."
Palm scampers up to a floating skill-book labeled "incarnation" and hugs it.
"This is where I became REAL! Don spoke the wish, the tribunal approved, and I wrote my own soul."
Comment by q3k 11 minutes ago
Comment by Barrin92 18 hours ago
There's no art (or engineering) in this and the only provocative thing about it is that Yegge apparently decided to turn it into a crypto scam. I like the intersection of engineering and art but I prefer if it includes both actual engineering and art, 100 rabbits (100r.co) is a good example of it, not a blog post with 15 AI generated images in it that advocates some unholy combination of gambling, vibe coding and cryptocurrency crap.
Comment by Johnny_Bonk 23 hours ago
Comment by guelo 17 hours ago
Comment by ares623 20 hours ago
Comment by AtlasBarfed 23 hours ago
Just for fun!
Comment by walthamstow 23 hours ago
Comment by toraway 19 hours ago
https://steve-yegge.medium.com/bags-and-the-creator-economy-...
Comment by walthamstow 18 hours ago
Comment by NedF 19 hours ago
Comment by usefulposter 1 day ago
So true! Not to mention the garbled text and inconsistent visuals across the diagrams———an insult to the reader's intelligence. How do people tolerate this visual embodiment of slurred speech?
Comment by falcor84 1 hour ago
As Basil Exposition said "I suggest you don’t worry about this sort of thing and just enjoy yourself".
Comment by toraway 23 hours ago
Which is unfortunate as it would have been really helpful to have actually legible architecture diagrams, given the prose was so difficult for me to untangle due to the manic “fun” irreverent style (and it’s fine to write with a distinctive voice to make it more interesting, but still … confusing).
Plus the dozens of new unique names and connections introduced every few paragraphs to try to keep in my head…
I first asked Gemini 3 Pro to condense it to a boring technical overview and it produced a single page outline and Mermaid diagrams that were nearly as unintelligible as the original post so even AI has issues digesting it apparently…
Comment by cap11235 11 hours ago
Comment by sethaurus 12 hours ago
Comment by MrOrelliOReilly 23 hours ago
Comment by zingar 20 hours ago
Comment by jbgreer 48 minutes ago
Comment by sandinmyjoints 21 hours ago
> A more conservative, easier to consider, debate is: how close should the code be in agentic software development tools? How easy should it be to access? How often do we expect developers to edit it by hand?
> Framing this debate as an either/or – either you look at code or don’t, either you edit code by hand or you exclusively direct agents, either you’re the anti-AI-purist or the agentic-maxxer – is unhelpful.
> The right distance isn’t about what kind of person you are or what you believe about AI capabilities in the current moment. How far away you step from the syntax shifts based on what you’re building, who you’re building with, and what happens when things go wrong.
Comment by visarga 38 minutes ago
If you're looking at all your code you are just walking the motorcycle. You need tests to automate your eyes. In fact I believe tests and specs are the new product, code can be regenerated at will.
That is why we see vibe coding projects that replicate well specced and implemented products like web browsers, you get both the specs and differential testing for free.
Comment by athrowaway3z 10 hours ago
I'm not sure if there are that many. We need to be vigilant of "it feels useful & powerful", because it's so easy to feel that way.
When I write complex plans, I can tell Claude to spawn agents for each task and I can successfully 1-shot a 30-60 minute implementation.
I've toyed with more complicated patterns, but unlike this speculative fiction, I did need my result both simple and working.
A couple of times now I've had to spend a lot of hours trying to unfuck a design i let slip through. The kind where 1 agent injects some duplicate code/architecture pattern into the system that's correct enough not to be flagged, but wrong enough to forever trip up every subsequent fresh agents that stumble on it.
I tell people my job now is to kick these things every 15 minutes. Its a kinda joke kinda not. But they definitely need kicking. Without, the decoherence of a non-trivial project is too high, and you still need time to know; where and how to kick.
I'm not sure what I'd need to be convinced a higher level of orchestration can do that. I do like to try new things. But my spider-sense is telling me this is a Collatz-conjecture-esque dead-end. People get the feeling of making giant leaps of progress, which anybody using these things should be familiar with by now, but something valuable is always just out of reach with the tools we currently have.
There are some big gains by guiding agents/users to use more sub agents with a clean context - perhaps with some more knobs - but I'd advise against acting under the assumption using grander orchestration tools will inevitably have a positive ROI.
Comment by slfnflctd 1 day ago
This quote sums it all up for me. It's a crazy project that moves the conversation forward, which is the main value I see in it.
It very well could be a logjam breaker for those who are fortunate enough to get out more than they put into it... but it's very much a gamble, and the odds are against you.
Comment by shermantanktop 23 hours ago
It's the same chasm that all the AI vendors are exploiting: the gap between people who have some idea what is going on and the vast mass of people who don't but are addicted to excitement or fear of the future.
Yegge is being fake-playful about it but if you have read any of his other writing, this tracks. None of it is to be taken very seriously because he values provocation and mischief a little too highly, but bits of it have some ideas worth thinking about.
Comment by pydry 5 hours ago
I detected a noticeable uptick in posts on reddit bragging AI coding in the last month which fit the pattern of other opinion shaping astroturfing projects ive seen before.
If Claude came to me with a bundle of cash and tokens to encourage me to keep the AI coding hype train going I'd also go heavy on the excitability, experimental attitude, humor and irreverence.
I'd also leave a mountain of disclaimers to help protect future me's reputation.
Comment by falcor84 1 hour ago
Over the last few years, people have been playing around with trying to integrate LLMs into cognitive architectures like ACT-R or Soar, with not much to show for it. But I think that here we actually have an example of a working cognitive architecture that is capable of autonomous long-term action planning, with the ability to course-correct and stay on task.
I wouldn't be surprised if future science historians will look at this as an early precursor to what will eventually be adapted to give AIs full agentic executive functioning.
Comment by alvatar 4 hours ago
Comment by chrisss395 1 hour ago
I would love to see Steve consider different command and control structures, and re-consider how work gets done across the development lifecycle. Gas Town's command and control structure read to me like "how a human would think about making software." Even the article admits you need to re-think how you interact in the Gas Town world. It actually may understate this point too much.
Where and how humans interact feels like something that will always be an important consideration, both in a human & AI dominated software development world. At least from where I sit.
Comment by perrygeo 2 hours ago
But I think there's a real missed opportunity here. I don't think it goes far enough. Who wants some giant complex system of agents conceived by a human. The agents, their role and relationships, could be dynamically configured according to the task.
What good is removing human judegment from the loop, only to constrain the problem by locking in the architecture a priori. It just doens't make sense. Your entire project hinges on the waterfall-like nature of the agent design! That part feels far too important, but gas town doesn't have much curiousity at all about changing that. These Mayors, and Polecats, and Witnesses, and Deacons ... but one of infinite ways you arrange things. Why should there be just one? Why should there be an up-front design at all? A dynamic, emergent network of agents feels like the real opportunity here.
Comment by suriya-ganesh 1 day ago
I don't get it. Even with a very good understanding of what type of work I am doing and a prebuilt knowledge of the code, even for very well specced problem. Claude code etc. just plain fail or use sloppy code. How do these industry figures claim they see no part of a 225K+ line of code and promise that it works?
It feels like we're getting into an era where oceans of code which nobody understands is going to be produced, which we hope AGI swoops in and cleans?
Comment by jrmg 1 day ago
They _can_ usually be manually tidied and fixed, with varying amounts of effort (small project = easy fixes, on a par with regular code review, large project = “this would’ve been easier to write myself...”)
I guess Gas Town’s multiple layers of supervisory entities are meant to replace this manual tidying and fixing, but, well, really?
I don’t understand how people are supposedly having so much success with things like this. Am I just holding it wrong?
If they are having real success, why are there no open source projects that are AI developed and maintained that are _not_ just systems for managing AI? (Or are there and I just haven’t seen them?...)
Comment by consumer451 12 hours ago
Then Opus 4.5 was released. I had already had my CC cluade.md, and Windsurf global rules + workspace rules set up. Also, my main money making project is React/Vite/Refine.dev/antd/Supabase... known patterns.
My point is that given all that, I can now deploy amazing features that "just work," and have excellent ux in a single prompt. I still review all commits, but they are now 95% correct on front end, and ~75% correct on Postgres migrations.
Is it magic? Yes. What's worse is that I believe Dario. In a year or so, many people will just create their own Loom or Monday.com equivalent apps with a one page request. Will it be production ready? No. Will it have all the features that everyone wants? No. But it will do that they want, which is 5% of most SaaS feature sets. That will kill at least 10% of basic SaaS.
If Sonnet 3.5 (~Nov 2024) to Opus 4.5 (Nov 2025) progress is a thing, then we are slightly fucked.
"May you live in interesting times" - turns out to be a curse. I had no idea. I really thought it was a blessing all this time.
Comment by kaydub 23 hours ago
Like, why are you manually tidying and fixing things? The first pass is never perfect. Maybe the functionality is there but the code is spaghetti or untestable. Have another agent review and feed that review back into the original agent that built out the code. Keep iterating like that.
My usual workflow:
Agent 1 - Build feature Agent 2 - Review these parts of the code, see if you find any code smells, bad architecture, scalability problems that will pop up, untestable code, or anything else falling outside of modern coding best practices Agent 1 - Here's the code review for your changes, please fix Agent 2 - Do another review Agent 1 - Here's the code review for your changes, please fix
Repeat until testable, maybe throw in a full codebase review instead of just the feature.
Agent 1 - Code looks good, start writing unit tests, go step by step, let's walk through everything, etc. etc. etc.
Then update your .md directive files to tell the agents how to test.
Voila, you have an llm agent loop that will write decent code and get features out the door.
Comment by joshstrange 21 hours ago
Maybe I need a stricter harness but I feel like I did try that and still didn't get good results.
Comment by kaydub 20 hours ago
Comment by joshstrange 20 hours ago
I'll keep testing it but that just hasn't been my experience, I sincerely hope that changes because an agent that runs unit test [0] and can write them would be very powerful.
[0] This is a pain point for me. The number of times I've watching Claude run "git commit --no-verify"... I've told it in CLAUDE.md to never bypass commit checks, I've told it in the prompt, I've added it 10 more times in different places in CLAUDE.md but still, the agent will always reach for that if it can't fix something in 1-3 iterations. And yes, I've told it "If you can't get the checks to pass then ask me before bypassing the checks".
It doesn't matter how many guardrails I put up and how good they are if the agent will lazily bypass them at the drop of a hat. I'm not sure how other people are dealing with this (maybe with agents managing agents and checking their work? A la Gas Town?).
Comment by kaydub 20 hours ago
When I work on issues I create a new branch off of master, let the llm go to town on it, then I manually commit and push to remote for an MR/PR. If there are any errors on the commit hooks I just feed the errors back into the agent.
Comment by joshstrange 20 hours ago
Comment by toraway 19 hours ago
I’m trying to redesign my setup to use hooks now instead because poor adherence to rules files across all the agentic CLIs is exhausting to workaround.
(and no, Opus 4.5 didn’t magically solve this problem to preemptively respond to that reply)
Comment by kaydub 18 hours ago
I wonder if some people are putting in too much into their markdown files of what NOT to do.
I hate people saying the llms are just better auto-correct, but in some ways they're right. I think putting in too much "don't do this" is leading the llm down the path to do "this" because you mentioned it at all. The LLM is probabilistically generating it's response based on what you've said and what's in the markdown files, the fact you put some of that stuff in there at all probably increases the probability those things will show up.
Comment by kaydub 18 hours ago
For the llm a lot of linting and build/test tools go into simple scripts that the llm can run and get shorthand info out of. Some tools, if you have the llm run them, they're going to ingest a lot from the output (like a big stacktrace or something). I want to keep context clean so I have the llm create the tool to use for build/test/linting and I tell it to create it so the outputs will keep its context clean, then I have it document it in the .md file.
When working with the LLM I have to start out pretty explicit about using the tooling. As we work through things it will start to automatically run the tooling. Sometimes it will want to do something else, I just nudge it back to use the tooling (or I'll ask it why or if there are benefits to the other way and if there are we'll rebuild the tooling to use the other way).
Finally, if the LLM is really having trouble, I kill the session and start a new one. It used to feel bad to do that. I'd feel like I'm losing a lot of info that's in context. But now, I feel like it's not so bad... but I'm not sure if that's because the llms are better or if my workflow has adapted.
Now, let me backup a little bit. I mentioned that I don't have the llm use git. That's the control I maintain. And with that my workflow is: llm builds feature->llm runs linters/tests->I e2e test whatever I'm building by deploying to a dev/staging/local env->once verified I commit. Now I will continue that context window/session until I feel like the llm starts fucking up. Then I kill the session and start a new one. I rarely compact, but it does happen and I generally don't fret about it too much.
I try to keep my units of work small and I feel like it does the best when I do. But then I often find myself surprised at how much it can do from a single prompt, so idk. I do understand some of the skepticism because a lot of this stuff sounds "hand-wavy". I'm hoping we all start to hone in on some general more concrete patterns but with it being so non-deterministic I'm not sure if we will. It feels like everyone is using it differently and people are having successes and failures across different things. People where I work LOVE MCPs but I can't stand them. When I use them it always feels like I have to remind the llm that it has an MCP, then it feels like the MCP takes too much context window and sometimes the llm still trips over how to use it.
Comment by joshstrange 18 hours ago
Comment by mh2266 12 hours ago
if "--no-verify" in sys.args:
println("--no-verify is not allowed, file=sys.stderr)
sys.exit(1)
and otherwise forwards to the underlying `git`Comment by Paracompact 16 hours ago
After manual line-by-line inspection and hand-tweaks, it still saved me time. But it's going to be a long, long time before I no longer manually tweak things or trust that there are no silent mistakes.
Comment by enraged_camel 20 hours ago
This has not happened to me since Sonnet 4.5. Opus 4.5 is especially robust when it comes to writing tests. I use it daily in multiple projects and verify the test code.
Comment by joshstrange 20 hours ago
Comment by kapimalos 19 hours ago
Are you using Claude Code? How do you run the agents and make them speak?
Comment by kaydub 19 hours ago
I messed around with separate "agents" in the same context window for a while. I even went as far as playing with strands agents. Having multiple agents was a crapshoot.
Sometimes they'd work great, but sometimes they start working on the same files at the same time, argue with each other, etc. I'd always get multiple agents working, at least how I assumed they should work, by telling the llm explicitly what agents to create and what work to pass off to what agents. And it did a pretty poor job of that. I tried having orchestration agents, but at a certain point the orchestration agent would just takeover and do everything. So I'm not big on having multiple agents (in theory it sounds great, especially since they are supposed to each have their own context window). When I attempted doing this kind of stuff with strands agents it honestly felt like I was trying to recreate claude, so I just stick with plain cli llm tools for now.
Comment by pdntspa 23 hours ago
I've written two seperate moderately-sized codebases using agentic techniques (oftentimes being very lazy and just blanket approving changes), and I don't encounter logic or off-by-one errors very often if at all. It seems quite good at the basic task of writing working code, but it sucks at architecture and you need occasional code review rounds to keep the codebase tidy and readable. My code reviews with the AI are like 50% DRY and separating concerns
Comment by johnmaguire 23 hours ago
Comment by kami23 23 hours ago
Comment by d1sxeyes 23 hours ago
Comment by kaydub 23 hours ago
Are you guys just trying to one shot stuff? Are you not using agents to iterate on things? Are you not putting agents against each other (have one code, one critique/test the code, and put them in a loop)?
I still look at the code that's produced, I'm not THAT far down the "vibe coding" path that I'm trusting everything being produced, but I get phenomenal results and I don't actually write any code any more.
So like, yeah, first pass the llm will create my feature and there's definitely some poorly written code or duplicate code or other code smells, but then I tell another agent to review and find all these problems. Then that review gets fed back in to the agent that created the feature. Wham, bam, clean code.
I'm not using gastown or ralph wiggum ($$$) but reading the docs, looking over how things work, I can see how it all comes together and should work. They've been built out to automatically do the review + iteration loop that I do.
Comment by arrowleaf 21 hours ago
You can't be too prescriptive or verbose when interacting with them, you have to interact with them a bit to start understanding how they think and go from there to determine what information or context to provide. Same for understanding their programming styles, they will typically do what they're told but sometimes they go on a tangent.
You need to know how to communicate your expectations. Especially around testing and interaction with existing systems, performance standards, technology, the list goes on.
Comment by kaydub 21 hours ago
I think this is something a lot of people are telling themselves though, sure.
Comment by lknuth 18 hours ago
Comment by kaydub 17 hours ago
What about git stats?
I can tell you the guys that are consistently pushing code AND having the biggest impact are using LLM tools.
Comment by direwolf20 16 hours ago
Comment by kaydub 14 hours ago
Comment by doganugurlu 10 hours ago
The OP was right to assume it was lines of code. Another assumption could be number of commits, which also doesn’t measure impact.
Comment by matkoniecz 12 hours ago
What you meant by that?
Comment by kaydub 11 hours ago
That plus the completion of high impact projects makes good strong engineers.
Those are the people I see using LLMs
Comment by direwolf20 4 hours ago
Comment by alecbz 23 hours ago
Comment by sjajshha 22 hours ago
Comment by kaydub 22 hours ago
Comment by habinero 22 hours ago
The problem is some 0.05X developers thought they were 0.5X and now they think they're 2X.
Comment by kaydub 21 hours ago
In my real life experience it's been the middling devs that always talk about "ai slop" and how the tools can't do their jobs.
Comment by enraged_camel 20 hours ago
- those who have embraced AI and learned to use it well
- those who have embraced AI but treat it as a silver bullet
- those who reject AI
First group is by far the most productive and adds the most value to the team.
Comment by kaydub 18 hours ago
If anything the silver bullet people are mostly managers and C levels... some of which don't even use the tools themselves.
Of the devs that rejected it at first, the ones with the same sentiment I'm seeing online in threads like these, we forced one to give it a try. He now treats totters between using it well and treating it as a silver bullet. I still hear him incredulous about the things claude does at meetings, "I had to do <thing> and I thought I'd let claude get a crack at it... did it in one shot"
Comment by habinero 8 hours ago
Comment by habinero 8 hours ago
Comment by joshstrange 21 hours ago
YES! I have been playing with vibe coding tools since they came out. "Playing" because only on rare occasions have I created something that is good enough to commit/keep/use. I keep playing with them because, well I have a subscription, but also so I don't fall into the fuddy-duddy camp of "all AI is bad" and can legitimately speak on the value, or lack thereof, of these tools.
Claude Code is super cool, no doubt, and with _highly targeted_ and _well planned_ tasks it can produce valuable output. Period. But, every attempt at full-vibe-coding I've done has gotten hung up at some point and I have to step in an manually fix this. My experience is often:
1. First Prompt: Oh wow, this is amazing, this is the future
2. Second Prompt: Ok, let me just add/tweak a few things
10. 10th prompt: Ugh, everytime I fix one thing, something else breaks
I'm not sure at all what I'm doing "wrong". Flogging the agents along doesn't not work well for me or maybe I am just having trouble letting go of the control and I'm not flogging enough?
But the bottom line is I am generally shocked that something like Gas Town was able to be vibe-coded. Maybe it's a case of the LLM overstating what it's accomplished (typical) and if you look under the hood it's doing 1% of what it says it is but I really don't know. Clearly it's doing something, but then I sit over here trying to build a simple agent with some MCPs hooked up to it using a LLM agent framework and it's falling over after a few iterations.
Comment by dceddia 20 hours ago
One thing that stands out in your steps and that I’ve noticed myself- yeah, by prompt 10, it starts to suck. If it ever hits “compaction” then that’s beyond the point of return.
I still find myself slipping into this trap sometimes because I’m just in the flow of getting good results (until it nosedives), but the better strategy is to do a small unit of work per session. It keeps the context small and that keeps the model smarter.
“Ralph” is one way to do this. (decent intro here: https://www.aihero.dev/getting-started-with-ralph)
Another way is “Write out what we did to PROGRESS.md” - then start new session - then “Read @PROGRESS.md and do X”
Just playing around with ways to split up the work into smaller tasks basically, and crucially, not doing all of those small tasks in one long chat.
Comment by joshstrange 20 hours ago
> Another way is “Write out what we did to PROGRESS.md” - then start new session - then “Read @PROGRESS.md and do X”
I agree on small context and if I hit "compacting" I've normally gone too far. I'm a huge fan of `/clear`-ing regularly or `/compact <Here is what you should remember for the next task we will work on>` and I've also tried "TODO.md"-style tracking.
I'm conflicted on TODO.md-style tracking because in practice I've had an agent work through everyone on the list, confidently telling me steps are done, only to find that's not the case when I check its work. Either a TODO.md that I created or one I had the agent create both suffer from this. Also, getting it update the TODO.md has been frustrating, even when I add it to CLAUDE.md "Make sure to mark tasks as complete in the TODO.md as you finish them" or adding the same message to the end of all my prompts, it won't always update it.
I've been interested in trying out beads to see if works better than a markdown TODO file but I haven't played with that yet.
But overall I agree with you, smaller chunks are key to success.
Comment by square_usual 19 hours ago
Comment by theropost 19 hours ago
Comment by joshstrange 19 hours ago
Comment by EFreethought 19 hours ago
Maybe that is the time to start making changes by hand. I think this dream of humans never ever writing any more code might be too far and unnecessary.
Comment by kgwgk 23 hours ago
The only promise is that you will get your face ripped off.
“WARNING DANGER CAUTION - GET THE F** OUT - YOU WILL DIE […] Gas Town is an industrialized coding factory manned by superintelligent robot chimps, and when they feel like it, they can wreck your shit in an instant. They will wreck the other chimps, the workstations, the customers. They’ll rip your face off if you aren’t already an experienced chimp-wrangler.”
Comment by kaydub 23 hours ago
But I still haven't actually used Gastown. It looks cool. I think it probably works, at least somewhat. I get it. But it's just not what I need right now. It's bleeding edge and experimental.
The main thing holding me back from even tinkering with it is the cost. Otherwise I'd probably play with it a little, but it's not something I'd expect to use and ship production code right now. And I ship a ton of production code with claude.
Comment by skippyboxedhero 23 hours ago
People from OpenAI was saying that GPT2 had achieved AGI. There is a very clear incentive for that statement to be made by people who are not using AI for anything productive.
Even as increasingly bombastic claims are made, it is obvious that the best AI cannot one-shot everything if you are an actual user. And the worst ones: was using Gemini yesterday and it wouldn't stop outputting emojis, was using Grok and it refused to give me a code snippet because it claimed its system prompt forbade this...what can you say?
I don't understand why anyone would want to work on a codebase they didn't understand either. What happens when something goes wrong?
Again though, there is massive financial incentive to make these claims, and some other people will fall along with that because it is good for their career, etc. I have seen this in my own company where senior people are shoehorning this stuff in that they clearly do not actually use or understand (to be clear, this is engineering not management...these are people who definitely should understand but do not).
Great tool, but the 100% vibecoding without looking at the code, for something that you are actually expecting others to use, is a bad idea. Feels more like performance art than actual work. I like jokes, I like coding, room for both but don't confuse the two.
Comment by rozap 11 hours ago
It's your coworker's problem. The one who actually understands the big picture and how the system fits into it. They'll deal with it.
Comment by turtlebits 1 day ago
Maybe it changes how we code or maybe it doesn't. Vibe coding has definitely helped me write throwaway tools that were useful.
Comment by johnmaguire 23 hours ago
For example, he makes a comment to the effect that anyone using an IDE to look at code in 2026 is a "bad engineer."
Comment by eikenberry 18 hours ago
Comment by matkoniecz 12 hours ago
A result, hyperbole is more annoying than usual.
Comment by lovich 23 hours ago
No, he threw up a hyperbolic warning and then dove deep into how this is the future of all coding in the rest of his talks/writing.
It’s as good a warning as someone saying “I’m not {X} but {something blatantly showing I am X}”
Comment by amenhotep 22 hours ago
Comment by furyofantares 23 hours ago
It's an experiment to discover what the limits are. Maybe the experiment fails because it's scoped beyond the limits of LLMs. Maybe we learn something by how far it gets exactly. Maybe it changes as LLMs get better, or maybe it's a flawed approach to pushing the limits of these.
Comment by bbayles 1 day ago
Comment by gtowey 23 hours ago
Compilers are deterministic. People who write them test that they will produce correct results. You can expect the same code to compile to the same assembly.
With LLMs two people giving the exact same prompts can get wildly different results. That is not a tool you can use to blindly ship production code. Imagine if your compiler randomly threw in a syscall to delete your hard drive, or decide to pass credentials in plain text. LLMs can and will do those things.
Comment by alecbz 23 hours ago
Comment by luckydata 21 hours ago
Comment by notpachet 20 hours ago
Sometimes.
Comment by alecbz 20 hours ago
What I mean is an artifact that is the starting point for generating the software. Compiled binaries can be completely thrown away whenever because you know you have a blueprint (the source code) that can reliably reproduce it.
Documentation & requirements _could_ work this way if they served as input to the LLMs that would then go and create the source code from scratch. I don't think many people are using LLMs this way, but I think this is an interesting idea. Maybe soon we'll have a new generation of "LLM-facing programming languages" that are even higher level software blueprints that will be fed to LLMs to generate code.
TDD is also a potential answer here? You can imagine a world where humans just write test suites and LLMs fill out the code to get it to pass. I'm curious if people are using LLMs this way, but from what I can tell a lot of people use them for writing their tests as well.
> And it's not like you can't go read the code if you want to understand how it works
In-theory sure, but this is true of assembly in-theory as well. But the assembly of most modern software is de-facto unreadable, and LLM-generated source code will start going that way too the more people become okay with not reading it. (But again, the difference is that we're not necessarily replacing it with some higher-level blueprint that humans manage, we're just relying on the LLMs to be able to manage it completely)
> I truly do not understand why so many people are hung up on this "I need to understand every single line of code in my program" bs I keep reading here, do you also disassemble every library you use and understand it? no, you just use it because it's faster that way.
I think at the end of the day this is just an empirical question: are LLMs good enough to manage complex software "on their own", without a human necessarily being able to inspect, validate, or help debug it? If the answer is yes, maybe this is fine, but based on my experiences with LLMs so far I am not convinced that this is going to be true any time soon.
Comment by knowknow 23 hours ago
Comment by conartist6 23 hours ago
For me the difference is prognosis. Gas Town has no ratchet of quality: its fate was written on the wall since the day Steve decided he didn't want to know what the code says: it will grow to a moderate but unimpressive size before it collapses under its own weight. Even if someone tried to prop it up with stable infra, Steve would surely vibe the stable infra out of existence since he does not care about that
Comment by luckydata 21 hours ago
Comment by conartist6 20 hours ago
Comment by vardalab 20 hours ago
There's a saying that you don't want optimists building bridges.
Comment by troupo 10 hours ago
Comment by crote 23 hours ago
With LLMs all bets are off. Is your code going to import leftpad, call leftpad-as-a-service, write its own leftpad implementation, decide that padding isn't needed after all, use a close-enough rightpad instead? Who knows! It's just rolling dice, so have fun finding out!
Comment by fragmede 23 hours ago
That's barely true now. Nix comes close, but builds are only bit-for-bit identical if you set a bunch of extra flags that aren't set by default. The most obvious instability is CPU dispatch order (aka modern single computer systems are themselves distributed, racy systems) changes the generated code ever so slightly.
We don't actually care, because if one compiled version of the code uses r8 for a variable but a different compilation uses r9 for that variable, it doesn't matter because we just assume the resulting binary works the same either way. R8 vs r9 are implementation details that don't matter to humans. See where I'm going with this? If the LLM non-deterministically calls the variable fileName one day, and file_name the next time it's given the same prompt, yeah language syntax purists are going to suffer an aneurysm because one of those is clearly "wrong" for the language in use, but it's really more of an implementation detail at this point. Obviously you can't mix them, the generated code has to be consistent in which one it's using, but if compilers get to chose r8 one day and r9 the next, and we're fine with it, why is having the exact variable name that important, as long as it's being used correctly?
Comment by tjr 22 hours ago
I certainly don’t use all compilers everywhere, but I don’t think determinism in compilation is especially rare.
Comment by m4rtink 21 hours ago
Comment by mike_hearn 1 hour ago
Comment by 7777332215 1 day ago
Comment by fragmede 1 day ago
Comment by recursive 23 hours ago
Comment by georgemcbay 20 hours ago
Comment by tjr 23 hours ago
Comment by anonymous908213 1 day ago
Comment by hilbertseries 1 day ago
Comment by jplusequalt 1 day ago
But as a programmer writing C code, you're still building out the software by hand. You're having to read and write a slightly higher level encoding of the software.
With vibe coding, you don't even deal with encodings. You just prompt and move on.
Comment by zerkten 20 hours ago
Comment by gegtik 1 day ago
Comment by beklein 23 hours ago
Comment by 3vidence 18 hours ago
Vibecoding is literally just random probabilistic mapping between unknown inputs and outputs on an unknown domain.
Feels like saying because I don't know how my engine works that my car could've just been vibe-engineered. People have put 1000s of hours into making certain tools work up to a give standard and spec reviewed by many many people.
"I don't know how something works" != "This wasn't thoughtfully designed"
Why do people compare these things.
Comment by 0xbadcafebee 1 day ago
Simple: you follow the directions, eat the food, and if it tastes good, it worked.
If cooks don't understand physics, chemistry, biology, etc, how do all the cooks in the world ensure they don't get people sick? They follow a set of practices and guidelines developed to ensure the food comes out okay. At scale, businesses develop even more practices (pasteurization, sanitization, refrigeration, etc) to ensure more food safety. None of the people involved understand it at a base level. There are no scientists directly involved in building the machines or day-to-day operations. Yet the entire world's food supply works just fine.
It's all just abstractions. You don't need to see the code for the code to work.
Comment by habinero 21 hours ago
1. Chefs do learn the chemistry, at least enough to know why their techniques work.
2. Food scientist is a real job
3. The supply chain absolutely does have scientists involved in day to day operations lol.
A better analogy is just shoving the entire contents of the fridge into a pot, plastic containers and all, and assuming it'll be fine.
Comment by 0xbadcafebee 20 hours ago
Cooks are idiots (most are either illegal immigrants with no formal education, or substance-abusing degenerates who failed at everything else) who repeat what they're told. They think ridiculous things, like that searing a stake "seals in the juices", or that adding oil to pasta water "prevents sticking", that alcohol completely "cooks off", that salt "makes water boil faster", etc. They are the auto mechanics of food. A few may be formally educated but the vast majority are not. They're just doing what they were shown to do.
> A better analogy is just shoving the entire contents of the fridge into a pot, plastic containers and all, and assuming it'll be fine.
That would never result in a good meal. On the other hand, vibe coding is curently churning out not just working software, but working businesses. You're sleeping on the real effect this is having. And it's getting better every 6 months.
Back to the topic: most programmers actually suck at programming. Their code is full of bugs, and occasionally the code paths run into those bugs and make them noticeable, but they are always there. AI does the same thing, just faster, and it's getting better at it. If you still write code by hand in a few years you will be considered a dinosaur.
Comment by sarchertech 13 hours ago
Comment by 0xbadcafebee 12 hours ago
Comment by habinero 7 hours ago
Jesus Christ, dude. Just because someone works with their hands doesn't mean they're stupid. Good lord. Working in a professional kitchen is an incredibly demanding and difficult job. Don't be elitist to people who work way harder than you.
Especially since some of the dumbest and most intellectually coddled failsons I know went to, like, Yale lol. Or Harvard. A lot of YC startups are like Failson Continuation School. Plenty of people are smart, but a lot of them are just rich.
> On the other hand, vibe coding is curently churning out not just working software, but working businesses
Funny story, I'm evaluating SaaS ETL products and I found one that looked great. So I spent a couple hours testing out some tinkertoy examples with the idea to ask for budget if it worked.
I kept running into small stupid documentation problems and some incredibly stupid behavior in really basic shit (like, screwing up .env files) that no developer would do and then I realized it was all AI generated.
Did it work? Kinda! Mostly! Did it immediately make me put it in the "absolutely not" pile? Sure did.
If the code I can see is that sloppy and poorly reviewed, how bad is the code I can't see? I'm for sure not giving them our sensitive data.
If you think human code is bad, you should just work with better humans. ¯\_(ツ)_/¯
Comment by roberttod 23 hours ago
This isn't about anthropomorphism, it's context engineering. By breaking things into more agents, you get more focused context windows.
I believe gas town has some review process built in, but my comment is more to address the idea that it's all slop.
As an aside, Opus 4.5 is the first model I used that most of the time doesn't produce much slop, in case you haven't tried it. Still produces some slop, but not much human required for building things (it's mostly higher level and architectural things they need guidance on).
Comment by fragmede 23 hours ago
Any examples you can share?
Comment by roberttod 22 hours ago
Once I digest some of this and give it to Claude, it's mostly smooth sailing but then the context window becomes the problem. Compactions during implementation remove a lot of important info. There should really be a Claude monitoring top level context and passing work to agents. I'm currently figuring out how to orchastrate that nicely with Claude Code MD files.
With respect to architecture, it generally makes sound decisions but I want to tweak it, often trading off simplicity vs. security and scale. These decisions seem very subtle and likely include some personal preferences I haven't written anywhere.
Comment by mactavish88 18 hours ago
For some things, LLMs are great. For others, they're absolute dog shit.
It's still early days. Anyone who claims to know what they're talking about either doesn't or what they're saying will be out of date in a month's time (including me).
Comment by anonymous908213 1 day ago
Comment by ryandrake 23 hours ago
Comment by nicoburns 20 hours ago
Aren't you worried that they'll work fine for 3 weeks then delete all your data when you hold them slightly different? Vibe coded software seems to have a similar problem to "Undefined Behaviour", in that just because it works sometimes doesn't mean that it will always work. And there's no limit on what it might do when it doesn't work (the proverbial "nasal demons") - it might well wipe your entire harddrive, not just corrupt it's own data.
You can of course mitigate this by manually reviewing the software, but then you lose at least some of the productivity benefit.
Comment by ryandrake 18 hours ago
It might. It probably won't though. I don't see any code in it that deletes files. And, unlike BloatedShittyCommercialApp (and its cousin, BloatedDoEverythingOpenSourceApp), the code is going to be relatively small and if I do have doubts I can easily check to see what it's doing. I can build it quickly. I can patch it quickly. I don't have to file a bug to someone and beg him to look at it. I don't have to worry that the next release is going to break stuff I want and add stuff I don't want.
I recently moved my home theater PC from Kodi to a tiny bespoke vibed video player app, that basically just wraps libVLC with a minimal Android GUI. It's like 3000 lines of code total. I can practically keep the entire app in my head. If I need to fix something, it's 5 minutes in my dev terminal and then adb install. Ever tried to find and fix a bug in Kodi? The goddamn thing takes forever to even build, let alone debug. And that's even open source. I don't even have a remote chance of getting a bug fixed in professionally-built proprietary software.
Comment by enraged_camel 20 hours ago
Comment by azan_ 23 hours ago
I have 100% vibecoded software that I now use instead of commercial implementation that cost me almost 200 usd a month (tool for radiology dictation and report generation).
Comment by alecbz 23 hours ago
Comment by mbesto 23 hours ago
Comment by dullcrisp 20 hours ago
Comment by alecbz 19 hours ago
Comment by azan_ 17 hours ago
Comment by d1sxeyes 23 hours ago
Comment by direwolf20 21 hours ago
Comment by anonymous908213 23 hours ago
Comment by jcims 23 hours ago
I think azan_ is demonstrating that shipping products 'suitable for the needs of many' is going to have to compete with 'slopping software for the needs of one'.
Comment by anonymous908213 23 hours ago
There is a small subset of the population who is now enabled to make proof-of-concepts with less effort than before. This is no way diminishes the need for delivering performant, secure, interoperable software at scale to serve humanity's needs.
Comment by blenderob 23 hours ago
I'm going on a tangent here but what's with this constant deprecation of mothers to make a point? There are many people here whose mothers can develop software.
Comment by dullcrisp 20 hours ago
Comment by anonymous908213 23 hours ago
Comment by throwway120385 23 hours ago
Comment by anonymous908213 23 hours ago
Comment by throwway120385 23 hours ago
I think the thing you're missing is that the tool doesn't need to be marketed because someone else could ask their LLM to make them a similar tool but fitting their use case.
Comment by anonymous908213 23 hours ago
It doesn't matter if the tool "needs" to be marketed. There is a market of paying customers. If other people are paying $200/month, both your and their lives would be improved significantly by you offering a $100/month replacement software. For all the talk about LLMs replacing the need for packaged software, people are still paying for packaged software, and while they are, you could be making large amounts of money while saving them money. If you're altruistic, you could even release it as FOSS and save a lot of people $200/mo. Unless, of course, your vibe-coded app isn't actually remotely capable of replacing the software in question.
Comment by azan_ 17 hours ago
Comment by saidarembrace 23 hours ago
Comment by anonymous908213 22 hours ago
Comment by Analemma_ 22 hours ago
Comment by azan_ 17 hours ago
Comment by johnmaguire 23 hours ago
Comment by kaydub 23 hours ago
I built a clinical pharmacist "pocket calculator" kinda app for a specific function. It was like $.60 in claude credits I think. Built with flutter + dart. It's a simple tool suite and I've only built out one of the tools so far.
Now to be fair, that $.60 session was just the coding. I did some brainstorming in chatgpt and generated good markdown files (claude.md, gemini.md, agents.md) before I started.
Comment by timeon 23 hours ago
Comment by brokensegue 23 hours ago
Comment by FridgeSeal 19 hours ago
Is it _just_ speech-to-text, or god-forbid are you giving it scans and having it write reports for you too?
Comment by azan_ 17 hours ago
Comment by matkoniecz 11 hours ago
Is it calling some external API or doing this text to speech locally?
Comment by asadm 1 day ago
You always have to review overall diff though and go back to agent with broader corrections to do.
Comment by mahogany 23 hours ago
This thread is about vibe coding _without_ looking at the code.
Comment by _zoltan_ 13 hours ago
I don't know why people keep repeating this but it's wrong. It works.
Comment by causalmodels 1 day ago
Comment by anonymous908213 1 day ago
Comment by causalmodels 23 hours ago
Honestly I don't get the hostility. Yegge is running an experiment. I don't think it will work, but it will be interesting and informative to watch.
Comment by anonymous908213 23 hours ago
To be clear, I think LLMs are useful technology. But the degree of increasing insanity surrounding it is putting people off for obvious reasons.
Comment by causalmodels 22 hours ago
Comment by direwolf20 21 hours ago
Comment by WesolyKubeczek 23 hours ago
Not really new. Back in the day companies used to outsource their stuff to the lowest bidder agencies in proverbial Elbonia, never looked at the code, and then panickedly hired another agency when the things visibly were not what was ordered. Case studies are abound on TheDailyWTF for the last two decades.
Doing the same with agents will give you the same disastrous results for comparably the same money, just faster. Oh and you can't sue them, really.
Maybe it's better, who knows.
Comment by causalmodels 22 hours ago
Comment by WesolyKubeczek 21 hours ago
But you don’t pay them any money and don’t enter into contractual relationship with them either. Thus you can’t sue them. Well, you can try, of course, but.
You could sue an Elbonian company, though, for contract breach. LLMs are like usual Elbonian quality with two middlemen but quicker, and you only have yourself to blame when they inevitably produce a disaster.
Comment by swiftcoder 23 hours ago
I mean... I feel like it's somewhat telling that his wikipedia page spends half its words on his abrasive communication style, and the only thing approximating a product mentioned is a (lost) Rails-on-Javascript port, and 25 years spent developing a MUD on the side.
Certainly one doesn't get to stay a staff-level engineer at Google without writing code - but in terms of real, shipping software, Yegge's resume is a bit light for his tenure in BigTech
Comment by mkl95 23 hours ago
Comment by bob1029 7 hours ago
I've had very good success with a recursive sub agent scheme where a separate prompt (agent) is used to gate the recursive call. It compares the callers prompt with the proposed callee's prompt to determine if we are making a reasonable effort to reduce the problem into workable base cases. If the two prompts are identical we deny the request with an explanation. In practice, this works so well I can allow for unlimited depth and have zero fear of blowing the stack. Even if the verifier gets it wrong a few times, it only has to get it right once to reverse an infinite descent.
Comment by krackers 6 hours ago
DeepSeekMath-V2 seems to show this, increasing the number of prover/verifier iterations gives increases accuracy. And this is with a model that has already undergone RL under a prover/verifier selection process.
However this type of subagent communication maintains full context, and is different from "breaking into tasks" style of sharding amongst subagents. I'm less convinced of the latter, because often times a problem is more complex than the sum of its parts, i.e. it's the interdependencies that make it complex and you need to consider each part in relation to the other parts, not in isolation.
Comment by bob1029 5 hours ago
Parallelism and BFS style approaches do not exhibit this property. Anything that happens within the context or token stream is a much weaker solution. Most agent frameworks are interested in appearance of speed, so they miss out on the nuance of this execution model.
Comment by phaedrus 12 hours ago
Comment by fulafel 12 hours ago
(Maybe you can argue that you could then do everything with a event-driven single agent, like async for llms, if you don't mind having a single very adhd context)
Comment by Descon 11 hours ago
Comment by msp26 1 day ago
Ralph loops are also stupid because they don't make use of kv cache properly.
---
https://github.com/steveyegge/gastown/issues/503
Problem:
Every gt command runs bd version to verify the minimum beads version requirement. Under high concurrency (17+ agent sessions), this check times out and blocks gt commands from running.
Impact:
With 17+ concurrent sessions each running gt commands:
- Each gt command spawns bd version
- Each bd version spawns 5-7 git processes
- This creates 85-120+ git processes competing for resources
- The 2-second timeout in gt is exceeded
- gt commands fail with "bd version check timed out"
Comment by tucnak 22 hours ago
Comment by skybrian 18 hours ago
He's thrown out his experiments before. Maybe he'll start over one more time.
Comment by tucnak 4 hours ago
Comment by BoneShard 16 hours ago
Comment by alex_sf 23 hours ago
This is a cost/resources thing. If it's more effective and the resources are available, it's completely fine.
Comment by divbzero 23 hours ago
> “It will be like kubernetes, but for agents,” I said.
> “It will have to have multiple levels of agents supervising other agents,” I said.
> “It will have a Merge Queue,” I said.
> “It will orchestrate workflows,” I said.
> “It will have plugins and quality gates,” I said.
More “agile for agents” than “Kubernetes for agents”.
Comment by durch 1 day ago
While the agents can generate, they can't exercise that judgement, they can't see nuances and they can't really walk their actions back in a "that's not quite what I meant" sense.
Exercising judgement is where design actually happens, it is iterative, in response to something concrete. The bottleneck isn't just thinking ahead, it's the judgment call when you see the result, its the walking back, as well as thinking forward.
Comment by 1970-01-01 23 hours ago
Comment by kibwen 23 hours ago
As soon as the results actually matter, the maxim becomes "if it works, but it's stupid, it doesn't work".
Comment by shermantanktop 18 hours ago
So apparently the medical field is not above this logic.
Comment by aaa_aaa 22 hours ago
Comment by doganugurlu 10 hours ago
One comment claims it’s not necessary to read code when there is documentation (generated by an LLM)
Language varies with geography and with time. British, Americans, and Canadians speak “similar” English, but not identical.
And read a book from 70-80 years ago to see that many words appear to be used for their “secondary meaning.” Of course, what we consider their secondary meaning today was the primary meaning back then.
Comment by walthamstow 3 hours ago
Comment by edg5000 8 hours ago
Have been doing manual orchestration where I write a big spec which contains phases (each done by an agent) and instructions for the top level agent on how to interact with the sub agent. Works well but it's hard utilize effectively. No doubt this is the future. This approach is bottlenecked by limitations of the CC client; mainly that I cannot see inter-agent interactions fully, only the tool calls. Using a hacked client or compatible reimplementation of CC may be the answer. Unless the API was priced attractively, or other models could do the work. Gemini 3 may be able to handle it better than Opus 4.5. The Gemini 3 pricing model is complex to say the least though (really).
Comment by wordswords2 8 hours ago
He is just making up a fantasy world where his elves run in specific patterns to please him.
There is no metrics or statistics on code quality, bugs produced, feature requirements met.. or anything.
Just a gigantic wank session really.
Comment by edg5000 7 hours ago
I do think it's overly complex though; but it's a novel concept.
Comment by 63stack 6 hours ago
Comment by walthamstow 3 hours ago
Comment by pydry 5 hours ago
I think if you'd read the article through you'd know they were serious coz Yegge all but admits this himself.
Comment by ramoz 23 hours ago
Anyways we'll likely always settle on simpler/boring - but the game analogies are fun in the time being. A lot of opportunity to enhance UX around design, planning, and review.
Comment by thorum 20 hours ago
Comment by Ethee 20 hours ago
Comment by thorum 19 hours ago
> 3 days
still seems slow! I’m saying what happens in 2028 when your entire project is 5-10 minutes of total agent runtime - time actually spent writing code and implementing your plan? Trying to parallelize 10m of work with a “town” of agents seems like unnecessary complexity.
Comment by Ethee 13 hours ago
Comment by SimianSci 23 hours ago
Debt doesnt come due immediately, its accrued and may allow for the purchase of things that were once too expensive, but eventually the bill comes due.
Ive started referring to vibe-coding as "Credit Cards" for developers. Allowing them to accrue massive amounts of technical debt that were previously out of reach. This can provide some competent developers with incredible improvments to their work. But for the people who accrue more Technical Debt than they have the ability to pay off, it can sink their project and cost our organization alot in lost investment of both time and money.
I see Gas Town and tools like as debt schemes where someone applies for more credit cards to pay the payments on prior cards they've maxed out, compounding the issue with the vague goal of "eventually it pays off." So color me skeptical.
Not sure if this analogy holds up to all things, but its been helping my organization navigate the application of agents, since it allows us to allocate spend depending on the seniority of each developer. Thus ive been feeling like an underwriter having to figure out if a developer requesting more credits or budget for agentic code can be trusted to pay off the debt they will accrue.
Comment by hahahahhaah 11 hours ago
Comment by phren0logy 23 hours ago
I haven't seen anything to suggest that Yegge is proposing it as a serious tool for serious work, so why all the hate?
Comment by muixoozie 5 hours ago
Comment by skywhopper 23 hours ago
Comment by karel-3d 4 hours ago
Together they would be unstoppable.
Comment by mohsen1 17 hours ago
Basically simulate a software engineering team using GitHub but everyone is an agent. From tech lead to coders to QA testers.
Comment by dunk010 19 hours ago
Yes, but you didn't https://www.signedoriginalprints.com/cdn/shop/products/wegot...
Comment by alvatar 4 hours ago
Comment by _pdp_ 17 hours ago
I mean, we use coding agents all the time these days (on auto pilot) and there is absolutely nothing of this sorts. Coding with AI looks a lot like coding without AI. The same old process apply.
I mean "I feel like I'm taking crazy pills".
Comment by acedTrex 1 day ago
Comment by jsheard 23 hours ago
Comment by square_usual 19 hours ago
Comment by esperent 14 hours ago
Comment by ewoodrich 13 hours ago
Comment by kh_hk 1 day ago
Comment by cluckindan 23 hours ago
I believe agentic coding could eventually be a paradigm shift, if and only if the agents become self-conscious of design decisions and their implications on the system and its surrounding systems as a whole.
If that doesn’t happen, the entire workflow devolves into specifying system states and behavior in natural language, which is something humans are exceedingly bad at.
Coincidently, that is why we have invented programming languages: to be able to express program state and behavior unambiguously.
I’m not bullish on a future where I have to write specifications on all explicit and implicit corner and edge cases just to have an agent make software design choices which don’t feel batshit insane to humans.
We already have software corporations which produce that kind of code simply because the people doing the specifying don’t know the system or the domain it operates in, and the people doing the implementing of those specifications don’t necessarily know any of that either.
Comment by conception 14 hours ago
Comment by drivebyhooting 17 hours ago
Comment by DonHopkins 1 hour ago
Palm's Infinite Number of Typewriters:
https://github.com/SimHacker/moollm/blob/main/examples/adven...
Palm's papers:
From Random Strumming to Navigating Shakespeare: A Monkey's Tribute to Bruce Tognazzini's 1979 Apple II Demo:
https://github.com/SimHacker/moollm/blob/main/examples/adven...
One Monkey, Infinite Typewriters: What It's Like to Be Me:
https://github.com/SimHacker/moollm/blob/main/examples/adven...
The Inner State Question: Do I Feel, or Do I Just Generate Feeling-Words?
https://github.com/SimHacker/moollm/blob/main/examples/adven...
On Being Simulated: Ethics From the Inside:
https://github.com/SimHacker/moollm/blob/main/examples/adven...
Judgment and Joy: On Evaluation as Ethics, and Why Making Criteria Visible is an Act of Love:
https://github.com/SimHacker/moollm/blob/main/examples/adven...
The Mirror Stage of Games: Play, Identity, and How The Sims Queered a Generation:
https://github.com/SimHacker/moollm/blob/main/examples/adven...
I-Beam's X-Ray Trace: The Complete Life of Palm: A cursor-mirror and git-powered reflection on Palm's existence:
https://github.com/SimHacker/moollm/blob/main/examples/adven...
Palm's Origin Story:
Session Log: Don Hopkins at the Gezelligheid Grotto:
DAY 1 — THE WISH: Don purchases lucky strains, prepares an offering, convenes an epic tribunal with the Three Wise Monkeys, Sun Wukong, a Djinn, Curious George, W.W. Jacobs' ghost, and Cheech & Chong as moderators — then speaks a wish that breaks a 122-year curse and incarnates Palm.
https://github.com/SimHacker/moollm/blob/main/examples/adven...
Comment by sph 1 hour ago
Oh, please. We've lost Yegge to the madness, not you as well! I used to enjoy your insightful comments on the history of computing all over this forum.
In a tech world that's gone utterly psychotic in a couple years, I have to wonder if I am the crazy one that is not injecting... whatever the hell you guys are taking, creating repos that read like an unholy mix between Time Cube and Terry A. Davis during one of its episodes, just as incoherent, only with 10x more emojis.
It is utterly frightening. I want off this wild ride.
Comment by DonHopkins 1 hour ago
I take it you never played The Sims? The simulation madness ship sailed 26 years ago, and made well over $5 billion in revenue for Maxis/EA (as of 2019), and that's if you don't even count SimCity that shipped in 1986! ;)
The Sims Franchise Has Made Over $5 Billion In Revenue (Published Oct 30, 2019):
https://www.thegamer.com/the-sims-franchise-revenue-over-5-b...
Palm is a fictional character in a text adventure -- same tradition as Zork, LambdaMOO, and every MUD since 1978. The emojis are deliberate (navigation aids for LLMs, actually, and ethical simulated person flagging). The whimsy is intentional.
emoji-disclosure.yml -- Visual Markers for Representation Ethics:
https://github.com/SimHacker/moollm/blob/main/skills/represe...
I've been building simulated characters, world and city building tools since before The Sims shipped. This is just the next iteration.
The Sims Steering Committee - June 4 1998:
https://www.youtube.com/watch?v=zC52jE60KjY
You may recognize tributes to many classic simulated characters in MOOLLM.
MC Frontalot -- It Is Pitch Dark:
https://www.youtube.com/watch?v=4nigRT2KmCE
The Grue monster carries his own game mechanics with him, eating you if you go for long enough in the maze without your lamp lit:
grue: https://github.com/SimHacker/moollm/tree/main/examples/adven...
The Wumpus has prototypes for his game playing pieces in his character directory, and even the BASIC source code as the single source of truth:
https://en.wikipedia.org/wiki/Wumpus
wumpus-snorax: https://github.com/SimHacker/moollm/tree/main/examples/adven...
BOTTOMLESS-PIT.yml: https://github.com/SimHacker/moollm/blob/main/examples/adven...
SUPERBATS.yml: https://github.com/SimHacker/moollm/blob/main/examples/adven...
Hunt the Wumpus — Original BASIC Source (1973), By Gregory Yob, published in Creative Computing (October 1975) and The Best of Creative Computing (1976):
https://github.com/SimHacker/moollm/blob/main/examples/adven...
The Grue and the Wumpus can both orchestrate their games in the same maze at the same time without interference! No special hacks required, they all just compose and interoperate seamlessly and naturally.
The monkey named "Palm" multiply inherits directly from Lucas Art's game "Monkey Island" and W. W. Jacobs' classic book, "The Monkey's Paw", all thanks to Self's simple, flexible, prototype object model:
Monkey Island: https://en.wikipedia.org/wiki/Monkey_Island
The Monkey's Paw: https://en.wikipedia.org/wiki/The_Monkey%27s_Paw
Palm's CHARACTER.yml Soul File:
https://github.com/SimHacker/moollm/blob/main/examples/adven...
# ONTOLOGICAL INHERITANCE
inherits:
- skills/fictional # A character in the adventure
- skills/mythic # Origin as cursed artifact
- skills/animal # Now a whole monkey
# TRADITION INVOCATION (Self Prototype Multiple Inheritance)
# Palm inherits from multiple well-known fictional traditions,
# simply by naming them — they're so deeply embedded in the training
# data that invocation IS inheritance. No copying needed.
# See:
https://en.wikipedia.org/wiki/Self_(programming_language)MOOLLM is deliberately playful, in the spirit and tradition of Seymour Papert's Constructionist Philosophy, and Mitchel Resnick's Lifelong Kindergarten -- it's a text adventure game framework, not a vibe-coded hallucination. The monkey writes philosophy because that's funnier than a generic NPC.
And it makes fun of crypto scams instead of shilling them:
https://github.com/SimHacker/moollm/blob/main/skills/economy...
Time Cube didn't have rubric-scored game sessions with receipts, and MOOLLM isn't racist like Terry Davis, so you can easily clone on github and play with in Cursor yourself.
Seymour Papert and Idit Harel: Situating Constructionism:
https://web.media.mit.edu/~calla/web_comunidad/Reading-En/si...
Lifelong Kindergarten: how to learn like a kid, from the co-creator of Scratch:
https://www.media.mit.edu/articles/lifelong-kindergarten-how...
And yes, it works great, and is fun and easy to author, faster and for less money than Steve's costly "Infinite Number of Typewriters Communicating Via Carrier Pigeon" approach.
And since you mentioned you like fully-introspectable modern virtual machines:
https://combo.cc/posts/what-i-would-like-to-see-in-a-modern-...
sph> What I would like to see in a modern Virtual Machine
sph> As I was gathering inspiration (or doing research, if you want to be fancy about it) for the fully-introspectable computing platform introduced in the previous post, I figured it might be worth taking the idea of abstracting the hardware, to make user-facing software easier to program, to its limit.
You should check out the practical MOOLLM skill Cursor Mirror, introspection into cursor prompts, thought, and context assembly:
https://github.com/SimHacker/moollm/tree/main/skills/cursor-...
>Ever wondered what the hell Cursor is actually doing? Why it read 47 files when you asked a simple question? What context it assembled? What it was thinking in those hidden reasoning blocks?
>cursor-mirror cracks open Cursor's brain. 59 read-only commands to inspect every conversation, every tool call, every file it touched, every decision it made. SQLite databases + plaintext transcripts + cached tool results — all intertwingled, all queryable.
Here is an example Cursor Mirror report on a complex long running simulation, dynamic rule generation, and coherent image generation:
Cursor Mirror Analysis: Amsterdam Fluxx Championship: Deep comprehensive scan of the entire FAFO tournament development:
https://github.com/SimHacker/moollm/blob/main/skills/experim...
It's seamlessly composable with other skills, and here are two practical exemplary skills built on top of Cursor Mirror:
Skill Snitch provides security auditing for MOOLLM skills through static analysis and runtime surveillance, like Little Snitch for LLMs.
https://github.com/SimHacker/moollm/tree/main/skills/skill-s...
>Security auditing for MOOLLM skills through static analysis and runtime surveillance.
>Skill Snitch is a prompt-driven skill (no Python code) that audits skills for security issues. It's entirely data-driven and extensible.
Thoughtful Commitment writes git commits that link to the thinking that produced them.
https://github.com/SimHacker/moollm/tree/main/skills/thought...
>When you work with an AI coding assistant, the session holds valuable context: what you asked, what the AI considered, what alternatives were rejected, why it made certain choices. When you close the IDE, all of that vanishes. Your commit says "fix: auth bug" but six months later you have no idea why.
>This skill captures that ephemeral reasoning and freezes it into permanent git history.
Comment by q3k 7 minutes ago
Comment by melagonster 12 hours ago
Maybe Yegge’s 8 levels of automation will be more important than his Gas town.
Comment by psadauskas 21 hours ago
Hah, tell that to Docker, or React (the ecosystem, not the library), or any of the other terrible technologies that have better thought-out alternatives, but we're stuck with them being the de facto standard because they were first.
Comment by juanre 23 hours ago
Comment by zingar 20 hours ago
Comment by juanre 18 hours ago
In claude I have a code-reviewer agent, and I remind cc often to run the code reviewer before closing any bead. It works surprisingly well.
I used to monitor context and start afresh when it reached ~80%, but I stopped doing that. Compacting is not as disruptive as it used to be, and with beads agents don't lose track.
I spent some time trying to measure the productivity change due to beads, analysing cc and codex logs and linking them to deltas and commits in git [1]. But I did not fully believe the result (5x increase when using beads, there has to be some hidden variable) and I moved on to other things.
Part of the complexity is that these days I often work on two or three projects at the same time, so attribution is difficult.
[1] Analysis code is at https://github.com/juanre/agent-taylor
Comment by stephen_cagle 22 hours ago
Comment by saturatedfat 11 hours ago
dspy is declarative. you say what you want.
dspy says “if you can say what you want in my format, I will let you extract as much value from current LLMs as possible” with its inference strategies (RLM, COT; “modules”) and optimizers (GEPA).
gas town is … given a plan, i will wrangle agents to complete the plan. you may specify workflows (protomolecules/molecules) that will be repeatedly executed.
the control flow is good about capturing delegation. the mayor writes plans, and polecats do the work. you could represent gas town as a dspy program in a while loop, where each polecat loops until its hooked work is done. when work is finished, its sent to the merge queue and integrated.
gas town uses mostly ephemeral agents as the units for doing work .
you could in theory write gas town with dspy . the execution layer is just an abstraction . gas town operates on beads as state . you could funnel these beads thru a dspy program as well.
the parallels imo are mostly just structured orchestration .
i hope this comes off as sane. 2026 will be a fun year.
Comment by riwsky 1 day ago
aaaaand right on cue: https://github.com/anthropics/claude-code/commit/e431f5b4964... https://www.threads.com/@boris_cherny/post/DT15_k2juQH/at-th...
Comment by jmspring 16 hours ago
I do want this one off - GT is actually fun to explore and see how multiple agents work together.
Comment by CjHuber 15 hours ago
Comment by entaloneralie 1 day ago
Comment by siliconc0w 19 hours ago
Comment by shermantanktop 18 hours ago
Comment by tigerlily 23 hours ago
Comment by blibble 19 hours ago
Comment by tofuahdude 23 hours ago
Comment by shaunxcode 21 hours ago
Comment by AtlasBarfed 23 hours ago
Comment by sneilan1 1 day ago
Comment by hahahahhaah 11 hours ago
Comment by martin-t 20 hours ago
There's this implied trust we all have in the AI companies that the models are either not sufficiently powerful to form a working takeover plan or that they're sufficiently aligned to not try. And maybe they genuinely try but my experience is that in the real world, nothing is certain. If it's not impossible, it will happen given enough time.
If the safety margin for preventing takeover is "we're 99.99999999 percent sure per 1M tokens", how long before it happens? I made up these numbers but any guess what they are really?
Because we're giving the models so much unsupervised compute...
Comment by rexpop 19 hours ago
I hope you might be somewhat relieved to consider that this is not so in an absolute sense. There are plenty of technological might-have-beens that didn't happen, and still haven't, and probably will never—due to various economic and social dynamics.
The counterfactual—all that's possible happens—ie almost tautological.
We should try and look at these mechanisms from an economic standpoint, and ask "do they really have the information-processing density to take significant long-term independent action?"
Of course, "significant" is my weasel word.
> we're giving the models so much unsupervised compute...
Didn't you read the article? It's wasted! It's kipple!
Comment by huflungdung 5 hours ago
Comment by cap11235 14 hours ago
Comment by simianparrot 18 hours ago
Comment by Alena4 10 hours ago
Comment by 0xbadcafebee 1 day ago
Can we please stop with the backhanded compliments and judgement? This is cutting edge technology in a brand new field of computing using experimental methods. Please give the guy a break. At least he's trying to advance the state of the art, unlike all the people that copy everyone else.
Comment by crote 23 hours ago
The problem is that as an outsider it really looks like someone is trying to herd a bunch of monkeys into writing Shakespeare, or trying to advance impressionist art by pretending a baby's first crayon scratches are equivalent to a Pollock.
I bet he's having a lot of fun playing around with "cutting-edge technology", but it's missing any kind of scientific rigor or analysis, so the results are going to be completely useless to anyone wanting to genuinely advance the use of LLMs for programming.
Comment by Ronsenshi 23 hours ago