AI should only run as fast as we can catch up

Posted by yuedongze 1 day ago

190174Original

Comments

Comment by donatj 22 hours ago

I have been a developer for twenty years now. For me to trust code, my want is to understand every single line. I learned long ago working on projects with a team that that becomes impossible for a single person on large projects. I learned to trust that someone understands the code and between blames and Slack I can almost always hunt that person down.

More and more often, while doing code review, I find I will not understand something and I will ask, and the "author" will clearly have no idea what it is doing either.

I find it quite troubling how little actual human thought is going into things. The AIs context window is not nearly large enough to fully understand the entire scope of any decently sized applications ecosystem. It just takes small peaks at bits and makes decisions based on a tiny slice of the world.

It's a powerful tool and as such needs to be guided with care.

Comment by nradov 15 hours ago

We might have to give up on trust and understanding in complex domains. To draw an analogy from another field, pharmaceutical researchers often don't understand the exact mechanism of action for drugs they develop. Biological systems are too complex and much of the basic research hasn't been done yet. So they rely on rigorous testing to verify that new drugs are safe and effective. It isn't a perfect system — sometimes drugs get recalled or have warnings added later — but works well enough.

Comment by MLgulabio 18 hours ago

Software becomes legacy very fast.

I have seen so many projects were people who understood all of it, are just gone. They moved, did something else etc.

As soon as this happens, you no longer have anyone 'getting it'. You have to handle so many people adding/changing very thin lines across all components and you can only hope that the original people had enough foresight adding enough unit tests for core decisions.

So i really don't mind AI here anymore.

Comment by rnewme 9 hours ago

Not sure why this is dead, but in nearly all of my consulting gigs sooner or later I ended up having to check on project/service that is effectively abandoned. Last time this morning. Luckily I had claude code and CLI tools to go through few dozen repos and millions LOC to find some obscure endpoints and data structures, since there wasn't even anyone to ask what to look for.

Comment by fragmede 16 hours ago

Can humans though? There's a reason we don't just lump everything into one giant file and singleton class named DoIt(). Who hasn't come back around to some bit of code in a project and wondered what dumbass wrote this, only for the logs to tell you that it was you that wrote it, years ago. If AI is resulting in code that's more modular, in smaller digestible and understandable chunks, I'm not hearing that as a bad thing!

Comment by yuedongze 1 day ago

It's nice to see a wide array of discussions under this! Glad that I didn't give up on this thought and end up writing it down.

I want to stress that the main point of my article is not really about AI coding, it's about letting AI perform any arbitrary tasks reliably. Coding is an interesting one because it seems like it's a place where we can exploit structure and abstraction and approaches (like TDD) to make verification simpler - it's like spot-checking in places with a very low soundness error.

I'm encouraging people to look for tasks other than coding to see if we can find similar patterns. The more we can find these cost asymmetry (easier to verify than doing), the more we can harness AI's real potential.

Comment by felipeerias 1 day ago

Thinking about the relationship between creation and verification is a good way to develop productive workflows with AI tools.

One that works particularly well in my case is test-driven development followed by pair programming:

• “given this spec/context/goal/… make test XYZ pass”

• “now that we have a draft solution, is it in the right component? is it efficient? well documented? any corner cases?…”

Comment by Yoric 1 day ago

Note that in the case of coding, there is an entire branch of computer science dedicated to verification.

All the type systems (and model-checkers) for Rust, Ada, OCaml, Haskell, TypeScript, Python, C#, Java, ... are based on such research, and these are all rather weak in comparison to what research has created in the last ~30 years (see Rocq, Idris, Lean).

This goes beyond that, as some of these mechanisms have been applied to mathematics, but also to some aspects of finance and law (I know of at least mechanisms to prove formally implementations of banking contracts and tax management).

So there is lots to do in the domain. Sadly, as every branch of CS other than AI (and in fact pretty much every branch of science other than AI), this branch of computer science is underfunded. But that can change!

Comment by charcircuit 1 day ago

Considering how useful I've found AI at finding and fixing bugs proportional to the effort I put in, I question your claim that it's being underfunded. While I have learned things like Idris, in the end I never was able to practically use them to reduce bugs in the software I was writing unlike AI. It's possible that the funding towards these types of languages is actually distracting people from more practical solutions which could actually mean that it is overfunded in regards to program verification.

Comment by seanmcdirmid 1 day ago

Make the AI go to lots of meetings. It won’t stand a chance in keeping up its productivity.

Comment by zerosizedweasle 1 day ago

https://www.reuters.com/graphics/USA-ECONOMY/AI-INVESTMENT/g...

Comment by HPsquared 23 hours ago

1.6 trillion, why that's almost as much as the F-35 program!

Comment by timpera 23 hours ago

Great visualization!

Comment by blauditore 1 day ago

All these engineers who claim to write most code through AI - I wonder what kind of codebase that is. I keep on trying, but it always ends up producing superficially okay-looking code, but getting nuances wrong. Also fails to fix them (just changes random stuff) if pointed to said nuances.

I work on a large product with two decades of accumulated legacy, maybe that's the problem. I can see though how generating and editing a simple greenfield web frontend project could work much better, as long as actual complexity is low.

Comment by bob1029 1 day ago

I have my best successes by keeping things constrained to method-level generation. Most of the things I dump into ChatGPT look like this:

  public static double ScoreItem(Span<byte> candidate, Span<byte> target)
  {
     //TODO: Return the normalized Levenshtein distance between the 2 byte sequences.
     //... any additional edge cases here ...
  }

I think generating more than one method at a time is playing with fire. Individual methods can be generated by the LLM and tested in isolation. You can incrementally build up and trust your understanding of the problem space by going a little bit slower. If the LLM is operating over a whole set of methods at once, it is like starting over each time you have to iterate.

Comment by theshrike79 22 hours ago

"Dumping into ChatGPT" is by far the worst way to work with LLMs, then it lacks the greater context of the project and will just give you the statistical average output.

Using an agentic system that can at least read the other bits of code is more efficient than copypasting snippets to a web page.

Comment by bob1029 19 hours ago

> then it lacks the greater context of the project

This is the point. I don't want it thinking about my entire project. I want it looking at a very specific problem each time.

Comment by theshrike79 13 hours ago

But why?

Most code is about patterns, specific code styles and reusing existing libraries. Without context none of that can be applied to the solution.

If you put a programmer in a room and give them a piece of paper with a function and say OPTIMISE THAT! - is it going to be their best work?

Comment by 20 hours ago

Comment by samdoesnothing 1 day ago

I do this but with copilot. Write a comment and then spam opt-tab and 50% of the time it ends up doing what I want and I can read it line-by-line before tabbing the next one.

Genuine productivity boost but I don't feel like it's AI slop, sometimes it feels like its actually reading my mind and just preventing me from having to type...

Comment by jerf 1 day ago

I've settled in on this as well for most of my day-to-day coding. A lot of extremely fancy tab completion, using the agent only for manipulation tasks I can carefully define. I'm currently in a "write lots of code" mode which affects that, I think. In a maintenance mode I could see doing more agent prompting. It gives me a chance to catch things early and then put in a correct pattern for it to continue forward with. And honestly for a lot of tasks it's not particularly slower than "ask it to do something, correct its five errors, tweak the prompt" work flow.

I've had net-time-savings with bigger agentic tasks, but I still have to check it line-by-line when it is done, because it takes lazy shortcuts and sometimes just outright gets things wrong.

Big productivity boost, it takes out the worst of my job, but I still can't trust it at much above the micro scale.

I wish I could give a system prompt for the tab complete; there's a couple of things it does over and over that I'm sure I could prompt away but there's no way to feed that in that I know of.

Comment by CuriouslyC 1 day ago

It's architecture dependent. A fairly functional modular monolith with good documentation can be accessible to LLMs at the million line scale, but a coupled monolith or poorly instrumented microservices can drive agents into the ground at 100k.

Comment by yuedongze 1 day ago

I think it's definitely an interesting subject for Verification Engineering. the easier to task AI to do work more precisely, the easier we can check their work.

Comment by CuriouslyC 1 day ago

Yup. Codebase structure for agents is a rabbit hole I've spent a lot of time going down. The interesting thing is that it's mostly the same structure that humans tend to prefer, with a few tweaks: agents like smaller files/functions (more precise reads/edits), strongly typed functional programming, doc-comments with examples and hyperlinks to additional context, smaller directories with semantic subgroups, long/distinct variable names, etc.

Comment by lukan 1 day ago

Aren't those all things, humans also tend to prefer to read?

I like to read descriptive variable names, I just don't like to write them all the time.

Comment by hathawsh 1 day ago

I think your intuition matches mine. When I try to apply Claude Code to a large code base, it spends a long time looking through the code and then it suggests something incorrect or unhelpful. It's rarely worth the trouble.

When I give AI a smaller or more focused project, it's magical. I've been using Claude Code to write code for ESP32 projects and it's really impressive. OTOH, it failed to tell me about a standard device driver I could be using instead of a community device driver I found. I think any human who works on ESP-IDF projects would have pointed that out.

AI's failings are always a little weird.

Comment by seanmcdirmid 1 day ago

Have you tried having AI build up documentation on the code first and then correct it where it’s understanding is wrong, then running code changes with the docs in the context, you can even separate it out for each module if you are daring. Ai still takes alot of hand holding to be productive with, which means our jobs are safe for now until they start learning about SWe principles somehow.

Comment by manmal 1 day ago

In large projects you need to actually point it to the interesting files, because it has no way of knowing what it doesn’t know. Tell it to read this and that, creating summary documents, then clear the context and point it at those summaries. A few of those passes and you‘ll get useful results. A gap in its knowledge of relevant code will lead to broken functionality. Cursor and others have been trying to solve this with semantic search (embeddings) but IMO this just can’t work because relevance of a code piece for a task is not determinable by any of its traits.

Comment by Yoric 1 day ago

But in the end, do you feel that it has saved you time?

I find hand-holding Claude a permanent source of frustration, except in the rare case that it helps me discover an error in the code.

Comment by manmal 1 day ago

I‘ve had a similar feeling before Opus 4.5. Now it suddenly clicks with me, and it has passed the shittiness threshold, into the „often useful“ area. I suspect that’s because Apple is partnering with Anthropic and they will have improved Swift support.

Eg it‘s great for refactoring now, it’s often updating the README along with renames without me asking. It’s also really good at rebasing quickly, but only by cherry-picking inside a worktree. Churning out small components I don’t want to add a new dependency for, those are usually good on first try.

For implementing whole features, the space of possible solutions is way too big to always hit something that I‘ll be satisfied with. Once I have an idea on how to implement something in broad strokes, I can give a very error ridden first draft to it as a stream of thoughts, let it read all required files, and make an implementation plan. Usually that’s not too far off, and doesn’t take that long. Once that’s done, Opus 4.5 is pretty good at implementing that plan. Still I read every line, if this will go to production.

Comment by divan 22 hours ago

I start new projects "AI-first" – start with docs, and refining them on the go, with multiple CLAUDE.md in different folders (to give a right context where it's needed). This alone increases the chances of it getting tasks right tenfold. Plus I almost always verify myself all the code produced.

Ironically, this would be the best workflow with humans too.

Comment by daliusd 16 hours ago

I use AI successfully in two projects:

* My 5 years old project: monorepo with backend, 2 front-ends and 2 libraries

* 10+ years old company project: about 20 various packages in monorepo

In both cases I successfully give Claude Code or OpenCode instructions either at package level or monorepo level. Usually I prefer package level.

E.g. just now I gave instructions in my personal project: "Invoice styles in /app/settings/invoice should be localized". It figured out that unlocalized strings comes from library package, added strings to the code and messages files (added missing translations), however has not cleaned up hardcoded strings from library. As I know code I have written extra prompt "Maybe INVOICE_STYLE_CONFIGS can be cleaned-up in such case" and it cleaned-up what I have expected, ran tests and linting.

Comment by freedomben 1 day ago

I've tried it extensively, and have the same experience as you. AI is also incredibly stubborn when it wants to go down a path I reject. It constantly tries to do it anyway and will slip things in.

I've tried vibe coding and usually end up with something subtly or horribly broken, with excessive levels of complexity. Once it digs itself a hole, it's very difficult to extricate it even with explicit instruction.

Comment by qudat 1 day ago

Are you using it only on massive codebases? It's much better with smaller codebases where it can put most of the code in context.

Another good use case is to use it for knowledge searching within a codebase. I find that to be incredibly useful without much context "engineering"

Comment by eloisant 23 hours ago

It's also good on massive codebases that include a lot of "good practices" examples.

Let's say you want to add a new functionality, for example plug to the shared user service, that already exist in another service in the same monorepo, the AI will be really good at identifying an example and applying it to your service.

Comment by wubrr 1 day ago

I've generally had better luck when using it on new projects/repos. When working on a large existing repo it's very important to give it good context/links/pointers to how things currently work/how they should work in that repo.

Also - claude (~the best coding agent currently imo) will make mistakes, sometimes many of them - tell it to test the code it writes and make sure it's working - I've generally found its pretty good at debugging/testing and fixing it's own mistakes.

Comment by mrtksn 1 day ago

So far I found that AI is very good at writing the code as in translating english to computer code.

Instead of dealing with intricacies of directly writing the code, I explain the AI what are we trying to achieve next and what approach I prefer. This way I am still on top of it, I am able to understand the quality of the code it generated and I’m the one who integrates everything.

So far I found the tools that are supposed to be able to edit the whole codebase at once be useless. I instantly loose perspective when the AI IDE fiddles with multiple code blocks and does some magic. The chatbot interface is superior for me as the control stays with me and I still follow the code writing step by step.

Comment by bojan 1 day ago

> I work on a large product with two decades of accumulated legacy, maybe that's the problem.

I'm in a similar situation, and for the first time ever I'm actually considering if a rewrite to microservices would make sense, with a microservice being something small enough an AI could actually deal with - and maybe even build largely on its own.

Comment by vanviegen 1 day ago

If you're creating microservices that are small enough for a current-gen LLM to deal with well, that means you're creating way too many microservices. You'll be reminiscing your two decades of accumulated legacy monolith with fondness.

Comment by themafia 1 day ago

> as long as actual complexity is low.

You can start there. Does it ever stay that way?

> I work on a large product with two decades of accumulated legacy

Survey says: No.

Comment by rprend 1 day ago

<1 year old startup with fullstack javascript monorepo. Hosted with a serverless platform with good devex, like cloudflare workers.

That’s the typical “claude code writes all my code” setup. That’s my setup.

This does require you to fit your problem to the solution. But when you do, the results are tremendous.

Comment by silisili 1 day ago

> I work on a large product with two decades of accumulated legacy, maybe that's the problem

Definitely. I've found Claude at least isn't so good at working in large existing projects, but great at greenfielding.

Most of my use these days is having it write specific functions and tests for them, which in fairness, saves me a ton of time.

Comment by moomoo11 1 day ago

You need to realize when you’re being marketed to and filter out the nonsense.

Now I use agentic coding a lot with maybe 80-90% success rate.

I’m on greenfield projects (my startup) and maintaining strict Md files with architecture decisions and examples helps a lot.

I barely write code anymore, and mostly code review and maintain the documentation.

In existing codebases pre-ai I think it’s near impossible because I’ve never worked anywhere that maintained documentation. It was always a chore.

Comment by tuhgdetzhh 1 day ago

Yes, unfortunately those who jumped on the microservices hype train over the past 15 years or so are now getting the benefits of Claude Code, since their entire codebases fits into the context window of Sonnet/Opus and can be "understood" by the LLM to generate useful code.

This is not the case for most monoliths, unless they are structured into LLM-friendly components that resemble patterns the models have seen millions of times in their training data, such as React components.

Comment by manmal 1 day ago

Well structured monoliths are modularized just like microservices. No need to give each module its own REST API in order to keep it clean.

Comment by bccdee 1 day ago

Conversely, poorly-structured microservices are just monoliths where most of the code is in other repositories.

Comment by randomtoast 1 day ago

One problem is that the idea of being "well-structured" has gone overboard at some point over the past 20 years in many companies. As a result, many companies now operate highly convoluted monolithic systems that are extremely difficult to replace.

In contrast, a poorly designed microservice can be replaced much more easily. You can identify the worst-performing and most problematic microservices and replace them selectively.

Comment by tuhgdetzhh 22 hours ago

> One problem is that the idea of being "well-structured" has gone overboard at some point over the past 20 years

That's exactly my experience. While a well-structured monolith is a good idea in theory, and I'm sure such examples exist in practice, that has never been the case in any of my jobs. Friends working at other companies report similar experiences.

Comment by Yoric 1 day ago

I guess that the benefit of monoliths in the context is that they (often) live in distinct repositories, which makes it easier for Claude to ingest them entirely, or at least not get lost into looking at the wrong directory.

Comment by cogman10 1 day ago

Honestly, if you've ever looked at a claude.md file, it seems like absolute madness. I feel like I'm reading affirmations from AA.

Comment by manmal 1 day ago

It’s magical incantations that might or might not protect you from bad behavior Claude learned from underqualified RL instructors. A classic instruction I have in CLAUDE.md is „Never delete a test. You are only allowed to replace with a test that covers the same branches.“ and another one „Never mention Claude in a commit message“. Of course those sometimes fail, so I do have a message hook that enforces a certain style of git messages.

Comment by Havoc 1 day ago

> Never mention Claude in a commit message“. Of course those sometimes fail,

It’s hardcoded into the system prompt which is why your CLAUDE.md approach fails. Ended up intercepting it out via proxy

Comment by manmal 1 day ago

Thanks for this idea!

Comment by HWR_14 20 hours ago

Why would it be bad to mention Claude in a commit message?

Comment by manmal 15 hours ago

Just because Claude ran the commit command, doesn’t mean it wrote the code. That’s just a nasty marketing hack from Anthropic.

Comment by theshrike79 21 hours ago

Way too many agent prompt files are just fan fiction or D&D character background documents that have no actual effect on what the agent does =)

Comment by junkaccount 1 day ago

Can you prove it in a blog and post it here that you do better code snippets than AI. If you claim "what kind of codebase", you should be able to use some codebase from github to prove it?

Comment by gradus_ad 1 day ago

The proliferation of nondeterministically generated code is here to stay. Part of our response must be more dynamic, more comprehensive and more realistic workload simulation and testing frameworks.

Comment by OptionOfT 1 day ago

I disagree. I think we're testing it, and we haven't seen the worst of it yet.

And I think it's less about non-deterministic code (the code is actually still deterministic) but more about this new-fangled tool out there that finally allows non-coders to generate something that looks like it works. And in many cases it does.

Like a movie set. Viewed from the right angle it looks just right. Peek behind the curtain and it's all wood, thinly painted, and it's usually easier to rebuild from scratch than to add a layer on top.

Comment by Yoric 1 day ago

Exactly that.

I suspect that we're going to witness a (further) fork within developers. Let's call them the PM-style developers on one side and the system-style developers on the other.

The PM-style developers will be using popular loosely/dynamically-typed languages because they're easy to generate and they'll give you prototypes quickly.

The system-style developers will be using stricter languages and type systems and/or lots of TDD because this will make it easier to catch the generated code's blind spots.

One can imagine that these will be two clearly distinct professions with distinct toolsets.

Comment by OptionOfT 1 day ago

I actually think that the direct usage of AI will reduce in the system-style group (if it was ever large there).

There is a non-trivial cost in taking apart the AI code to ensure it's correct, even with tests. And I think it's easy to become slower than writing it from scratch.

Comment by Angostura 1 day ago

I just wanted to say how much I like that similie - I'm going to knick it for sure

Comment by wasmainiac 1 day ago

Code has always been nondetermistic. Which engineer wrote it? What was their past experience? This just feels like we are accepting subpar quality because we have no good way to ensure the code we generate is reasonable that wont mayyyybe rm-rf our server as a fun easter egg.

Comment by mort96 1 day ago

Code written by humans has always been nondeterministic, but generated code has always been deterministic before now. Dealing with nondeterministically generated code is new.

Comment by nowittyusername 1 day ago

determinism v nondeterminism is and has never been an issue. also all llms are 100% deterministic, what is non deterministic are the sampling parameters used by the inference engine. which by the way can be easily made 100% deterministic by simply turning off things like batching. this is a matter for cloud based api providers as you as the end user doesnt have acess to the inferance engine, if you run any of your models locally in llama.cpp turning off some server startup flags will get you the deterministic results. cloud based api providers have no choice but keeping batching on as they are serving millions of users and wasting precious vram slots on a single user is wasteful and stupid. see my code and video as evidence if you want to run any local llm 100% deterministocally https://youtu.be/EyE5BrUut2o?t=1

Comment by nazgul17 1 day ago

That's not an interesting difference, from my point of view. The box m black box we all use is non deterministic, period. Doesn't matter where on the inside the system stops being deterministic: if I hit the black box twice, I get two different replies. And that doesn't even matter, which you also said.

The more important property is that, unlike compilers, type checkers, linters, verifiers and tests, the output is unreliable. It comes with no guarantees.

One could be pedantic and argue that bugs affect all of the above. Or that cosmic rays make everything unreliable. Or that people are non deterministic. All true, but the rate of failure, measured in orders of magnitude, is vastly different.

Comment by nowittyusername 1 day ago

My man did you even check my video, did you even try the app. This is not "bug related" nowhere did i say it was a bug. Batch processing is a FEATURE that is intentionally turned on in the inference engine for large scale providers. That does not mean it has to be on. If they turn off batch processing al llm api calls will be 100% deterministic but it will cost them more money to provide the services as now you are stuck with providing 1 api call per GPU. "if I hit the black box twice, I get two different replies" what you are saying here is 100% verifiably wrong. Just because someone chose to turn on a feature in the inference engine to save money does not mean llms are anon deterministic. LLM's are stateless. their weights are froze, you never "run" an LLM, you can only sample it. just like a hologram. and depending on the inference sampling settings you use is what determines the outcome.....

Comment by pegasus 1 day ago

Correct me if I'm wrong, but even with batch processing turned off, they are still only deterministic as long as you set the temperature to zero? Which also has the side-effect of decreasing creativity. But maybe there's a way to pass in a seed for the pseudo-random generator and restore determinism in this case as well. Determinism, in the sense of reproducible. But even if so, "determinism" means more than just mechanical reproducibility for most people - including parent, if you read their comment carefully. What they mean is: in some important way predictable for us humans. I.e. no completely WTF surprises, as LLMs are prone to produce once in a while, regardless of batch processing and temperature settings.

Comment by nowittyusername 20 hours ago

You can change ANY sampling parameter once batch processing is off and you will keep the deterministic behavior. temperature, repetition penalty, etc.... I got to say I'm a bit disappointed in seeing this in hacker news, as I expect this from reddit. you bring the whole matter on a silver platter, the video describes in detail how any sampling parameter can be used, i provide the whole code opensource so anyone can try it themselves without taking my claims as hearsay, well you can bring a horse to water as they say....

Comment by glitchc 1 day ago

Agreed. It's a new programming paradigm that will put more pressure on API and framework design, to protect vibe developers from themselves.

Comment by yuedongze 1 day ago

i've seen a lot of startups that use AI to QA human work. how about the idea of use humans to QA AI work? a lot of interesting things might follow

Comment by hn_acc1 1 day ago

This feels a lot like the "humans must be ready at any time to take over from FSD" that Tesla is trying to push. With presumably similar results.

If it works 85% of the time, how soon do you catch that it is moving in the wrong direction? Are you having a standup every few minutes for it to review (edit) it's work with you? Are you reviewing hundreds of thousands of lines of code every day?

It feels a bit like pouring cement or molten steel really fast: at best, it works, and you get things done way faster. Get it just a bit wrong, and your work is all messed up, as well as a lot of collateral damage. But I guess if you haven't shipped yet, it's ok to start over? How many different respins can you keep in your head before it all blends?

Comment by 1 day ago

Comment by __loam 1 day ago

No thanks.

Comment by adventured 1 day ago

A large percentage (at least 50%) of the market for software developers will shift to lower paid jobs focused on managing, inspecting and testing the work that AI does. If a median software developer job paid $125k before, it'll shift to $65k-$85k type AI babysitting work after.

Comment by mjr00 1 day ago

It's funny that I heard exactly this when I graduated university in the late 2000s:

> A large percentage (at least 50%) of the market for software developers will shift to lower paid jobs focused on managing, inspecting and testing the work that outsourced developers do. If a median software developer job paid $125k before, it'll shift to $65k-$85k type outsourced developer babysitting work after.

Comment by Aldipower 1 day ago

Sounds inhuman.

Comment by quantummagic 1 day ago

As an industry, we've been doing the same thing to people in almost every other sector of the workforce, since we began. Automation is just starting to come for us now, and a lot of us are really pissed off about it. All of a sudden, we're humanitarians.

Comment by Terr_ 1 day ago

> Automation is just starting to come for us now

This argument is common and facile: Software development has always been about "automating ourselves out of a job", whether in the broad sense of creating compilers and IDEs, or in the individual sense that you write some code and say: "Hey, I don't want to rewrite this again later, not even if I was being paid for my time, I'll make it into a reusable library."

> the same thing

The reverse: What pisses me off is how what's coming is not the same thing.

Customers are being sold a snake-oil product, and its adoption may well ruin things we've spent careers de-crappifying by making them consistent and repeatable and understandable. In the aftermath, some portion of my (continued) career will be diverted to cleaning up the lingering damage from it.

Comment by A4ET8a8uTh0_v2 1 day ago

Nah, sounds like management, but I am repeating myself. In all seriousness, I have found myself having to carefully rein some of similar decisions in. I don't want to get into details, but there are times I wonder if they understand how things really work or if people need some 'floor' level exposure before they just decree stuff.

Comment by colechristensen 1 day ago

Yes, but not like what you think. Programmers are going to look more like product managers with extra technical context.

AI is also great at looking for its own quality problems.

Yesterday on an entirely LLM generated codebase

Prompt: > SEARCH FOR ANTIPATTERNS

Found 17 antipatterns across the codebase:

And then what followed was a detailed list, about a third of them I thought were pretty important, a third of them were arguably issues or not, and the rest were either not important or effectively "this project isn't fully functional"

As an engineer, I didn't have to find code errors or fix code errors, I had to pick which errors were important and then give instructions to have them fixed.

Comment by mjr00 1 day ago

> Programmers are going to look more like product managers with extra technical context.

The limit of product manager as "extra technical context" approaches infinity is programmer. Because the best, most specific way to specify extra technical context is just plain old code.

Comment by LPisGood 1 day ago

This is exactly why no code / low code solutions don’t really work. At the end of the day, there is irreducible technical complexity.

Comment by manmal 1 day ago

Yeah, don‘t rely on the LLM finding all the issues. Complex code like Swift concurrency tooling is just riddled with issues. I usually need to increase to 100% line coverage and then let it loop on hanging tests until everything _seems_ to work.

(It’s been said that Swift concurrency is too hard for humans as well though)

Comment by colechristensen 1 day ago

I don't trust programmers to find all the issues either and in several shops I've been in "we should have tests" was a controversial argument.

A good software engineering system built around the top LLMs today is definitely competitive in quality to a mediocre software shop and 100x faster and 1000x cheaper.

Comment by energy123 1 day ago

Nondeterministic isn't the right word because LLM outputs are deterministic and the tokens created from those outputs can also be deterministic.

Comment by Yoric 1 day ago

I agree that non-deterministic isn't the right word, because that's not the property we care about, but unless I'm strongly missing something LLM outputs are very much non-deterministic, both during the inference itself and when projecting the embeddings back into tokens.

Comment by energy123 1 day ago

I agree it isn't the main property we care about, we care about reliability.

But at least in its theoretical construction the LLM should be deterministic. It outputs a fixed probability distribution across tokens with no rng involvement.

We then sample from that fixed distribution non-deterministically for better performance or we use greedy decoding and get slightly worse performance in exchange for full determinism.

Happy to be corrected if I am wrong about something.

Comment by delis-thumbs-7e 23 hours ago

> A very good example of the first category is image (and video) generation. Drawing/rendering a realistic looking image is a crazily hard task. Have you tried to make a slide look nicer? It will take me literally hours to center the text boxes to make it look “good”. However, you really just need to take a look at the output of Nano Banana and you can tell if it’s a good render or a bad one based on how you feel.

The writer could be very accomplished when it comes to developing - I don’t know - but they clearly don’t understand a single thing about visual arts or culture. I probably could center those text boxes after fiddling with them maybe ten seconds - I have studied art since I was a kid. My bf could do it instantly without thinking a second, he is a graphic designer. You might think that you are able to see what « looks good » since, hey you have eyes, but no you can’t. There’s million details you will miss, or maybe feel something is off, but cannot quite say why. This is why you have graphic designers, who are trained to do that to do it. They can also use generative tools to make something genuinely stunning, unlike most of us. Why? Skills.

This is the same difference why the guy in the story who can’t code can’t code even with LLM, whereas the guy who cans is able to code even faster with these new tools. If use LLM’s for basically auto-completion (what transformer models really are for) you can work with familiar codebase very quickly I’m sure. I’ve used it to gen SQL call statements, which I can’t be bothered to type myself and it was perfect. If I try to generate something I don’t really understand or know how to do, I’m lost staring at sole horrible gobbledygoo that is never going to work. Why? Skills.

There is no verification engineering. There is just people who know how to do things, who have studied their whole life to get those skills. And no, you will not replace a real hardcore professional with an LLM. LLM’s are just tools, nothing else. A tractor replaced a horse in turning the field, bit you still need a farmer to drive it.

Comment by louthy 22 hours ago

> You might think that you are able to see what « looks good » since, hey you have eyes, but no you can’t.

I'm sure lots of people will reply to you stating the opposite, but for what it's worth, I agree. I am not a visual artist... well, not any more, I was really into it as a kid and had it beaten out of me by terrible art teachers, but I digress... I am creative (music), and have a semblance of understanding of the creative process.

I ran a SaaS company for 20 years and would be constantly amazed at how bad the choices of software engineers would be when it came to visual design. I could never quite understand whether they just didn't care or just couldn't see. I always believed (hoped) it was the latter. Even when I explained basic concepts like consistent borders, grid systems, consistent fonts and font-sizing, less visual clutter, etc. they would still make the same mistakes over and over.

To the trained eye they immediately see it and see what's right and what's wrong. And that's why we still need experts. It doesn't matter what is being generated, if you don't have expertise to know whether it's good or not, the chances are glaring errors will be missed (in code and in visual design)

Comment by MLgulabio 18 hours ago

I have learned a little bit of photoshop and 10 years ago maya too.

But i'm a software engineere by trade and I do not struggle with telling you that this thing has to move left for reason xy, i would struggle with random tools capable of doing that particular thing for me.

And it does not matter here how i did it if the result is the same result.

In Software Engineering this is just not always the case. Because often enough you would need to verify that what you get is the thing you expect (did the report actually take the right numbers) or Security. Security is the biggest risk to all ai coding out there. Security is already so hard because people don't see it, they ignore it because they don't know.

You have so many non functional requirements in software which just don't exist in art. If i need that image, thats it. Most complex thing here? Perhaps color calibration and color profiles. Resolution.

If we talk about 3D it gets again a little bit more complicated because now we talk the right 3d model, right way to rig, etc.

Also if someone says "i need a picture for x" and is happy about it, the risk is less customers. But if someone needs a new feature and tomorrow all your customer data are exposed or the companies product stops working because of a basic bug, the company might be gone a week later.

Comment by vbezhenar 21 hours ago

> A tractor replaced a horse in turning the field, bit you still need a farmer to drive it.

Before mechanisation, like 50x more people worked in the agricultural sector, compared to today. So tractors certainly left without work a huge number of people. Our society adapted to this change and sucked these people into industrial sector.

If LLM would work like a tractor, it would force 49 out of 50 programmers (or, more generically, blue-collar workers) to left their industry. Is there a place for them to work instead? I don't know.

Comment by jstanley 23 hours ago

Centering text boxes in competent design software is easy because it has a tool to align things to the centre of other things.

For example, Inkscape has this and it is easy to use.

Comment by wongarsu 22 hours ago

Though it's notable that sometimes this will produce "wrong" results because it centers on the geometric middle point of the box, while the correct thing is often more like bringing the center of gravity into the middle

I'm more of a fan of aligning to an edge anyways. But some designers love to get really deep into these kinds of things, often in ways they can't really articulate

Comment by delis-thumbs-7e 22 hours ago

I meant just by eye, mate. But it is pretty bad example anyway, obvs it is something that any program can do better than us. Better would be layout or maybe typography. Even professionals mess it up all the time.

Point is, even basic visual design is far from intuitive.

Comment by nradov 15 hours ago

We literally have self driving tractors now.

https://www.deere.com/en/autonomous/

Comment by HWR_14 20 hours ago

Centering the text on a slide is such a trivial thing. It is the default behavior.

Comment by aryehof 1 day ago

> “AI always thinks and learns faster than us, this is undeniable now”

No, it neither thinks nor learns. It can give an illusion of thinking, and an AI model itself learns nothing. Instead it can produce a result based on its training data and context.

I think it important that we do not ascribe human characteristics where not warranted. I also believe that understanding this can help us better utilize AI.

Comment by WhyOhWhyQ 18 hours ago

"AI always thinks and learns faster than us, this is undeniable now. "

Sort of a nitpick, because what's written is true in some contexts (I get it, web development is like the ideal context for AI for a variety of reasons), but this is currently totally false in lots of knowledge domains very much like programming. AI is currently terrible at the math niches I'm interested in. Since there's no economic incentive to improve things and no mountain of literature on those topics, unless AI really becomes self-learning / improves in some real way, I don't see the situation ever changing. AI has consistently gotten effectively a 0% score on my personal benchmarks for those topics.

It's just aggravating to see someone write "totally undeniable" when the thing is trivially denied.

Comment by jakeydus 16 hours ago

> It's just aggravating to see someone write "totally undeniable" when the thing is trivially denied.

You've described AI hype bros in a nutshell, I think.

Comment by jascha_eng 1 day ago

Verification is key, and the issue is that almost all AI generated code looks plausible so just reading the code is usually not enough. You need to build extremely good testing systems and actually run through the scenarios that you want to ensure work to be confident in the results. This can be preview deployments or other AI generated end to end tests that produce video output that you can watch or just a very good test suite with guard rails.

Without such automation and guard rails, AI generated code eventually becomes a burden on your team because you simply can't manually verify every scenario.

Comment by yuedongze 1 day ago

indeed, i see verification debt outweighing tradition tech debt very very soon...

Comment by jopsen 1 day ago

I would rather write the code and have AI write the tests :)

And I have on occasion found it useful.

Comment by bigbuppo 1 day ago

And with any luck, they don't vibe code their tests that ultimately just return true;

Comment by catigula 1 day ago

I can automatically generate suites of plausible tests using Claude Code.

If you can make as a rule "no AI for tests", then you can simply make the rule "no AI" or just learn to cope with it.

Comment by adxl 1 day ago

I remember a junior dev who thought he was done when his code conpiled without syntax errors.

Comment by kristjank 1 day ago

This feeling of verification >> generation anxiety bears a resemblance to that moment when you're learning a foreign language, you speak a well-prepared sentence, and your correspondent says something back, of which you only understand about a third.

In like fashion, when I start thinking of a programming statement (as a bad/rookie programmer) and an assistant completes my train of thought (as is default behaviour in VS Code for example), I get that same feeling that I did not grasp half the stuff I should've, but nevertheless I hit Ctrl-Return because it looks about right to me.

Comment by yuedongze 1 day ago

> because it looks about right to me

this is something one can look in further. it is really probabilistic checkable proofs underneath, and we are naturally looking for places where it needs to look right, and use that as a basis of assuming the work is done right.

Comment by Yoric 1 day ago

> Maybe our future is like the one depicted in Severance - we look at computer screens with wiggly numbers and whatever “feels right” is the right thing to do. We can harvest these effortless low latency “feelings” that nature gives us to make AI do more powerful work.

Come to think about it... aren't this exactly what syntax coloring and proper indentation are all about? The ability to quickly pattern-spot errors, or at least smells, based on nothing but aesthetics?

I'm sure that there is more research to be done in this direction.

Comment by bitwize 1 day ago

Our future is going to be like the one in Severance: no creation, no craftsmanship, just grooming the grids of numbers that do the actual work.

Comment by pglevy 1 day ago

I've been thinking about something like this from a UI perspective. I'm a UX designer working on a product with a fairly legacy codebase. We're vibe coding prototypes and moving towards making it easier for devs to bring in new components. We have a hard enough time verifying the UI quality as it is. And having more devs vibing on frontend code is probably going to make it a lot worse. I'm thinking about something like having agents regularly traversing the code to identify non-approved components (and either fixing or flagging them). Maybe with this we won't fall further behind with verification debt than we already are.

Comment by cousinbryce 1 day ago

That’s just static code analysis with extra steps

Comment by huflungdung 1 day ago

[dead]

Comment by gijoeyguerra 1 day ago

Make it do TDD. That'll slow it down.

Comment by vjvjvjvjghv 1 day ago

Make it do scrum with sprint planning, retrospectives and sprint demos. A then another AI as product owner and scrum master. Ideally this AI has only a vague idea of what the product needs to or the technology but still has decision power. That should really slow it down.

Comment by theshrike79 21 hours ago

This was done already, they're called "Agent networks" or whatever buzzword the dev decided to give their abomination :)

Comment by yannyu 1 day ago

I think there's a lot of utility to current AI tools, but it's also clear we're in a very unsettled phase of this technology. We likely won't see for years where the technology lands in terms of capability or the changes that will be made to society and industry to accommodate.

Somewhat unfortunately, the sheer amount of money being poured into AI means that it's being forced upon many of us, even if we didn't want it. Which results in a stark, vast gap like the author is describing, where things are moving so fast that it can feel like we may never have time to catch up.

And what's even worse, because of this industry and individuals are now trying to have the tool correct and moderate itself, which intuitively seems wrong from both a technical and societal standpoint.

Comment by 1 day ago

Comment by officerk 20 hours ago

A related post on the topic: https://martin.kleppmann.com/2025/12/08/ai-formal-verificati...

Comment by trjordan 1 day ago

The verification asymmetry framing is good, but I think it undersells the organizational piece.

Daniel works because someone built the regime he operates in. Platform teams standardized the patterns and defined what "correct" looks like and built test infrastructure that makes spot-checking meaningful and and and .... that's not free.

Product teams are about to pour a lot more slop into your codebase. That's good! Shipping fast and messy is how products get built. But someone has to build the container that makes slop safe, and have levers to tighten things when context changes.

The hard part is you don't know ahead of time which slop will hurt you. Nobody cares if product teams use deprecated React patterns. Until you're doing a migration and those patterns are blocking 200 files. Then you care a lot.

You (or rather, platform teams) need a way to say "this matters now" and make it real. There's a lot of verification that's broadly true everywhere, but there's also a lot of company-scoped or even team-scoped definitions of "correct."

(Disclosure: we're working on this at tern.sh, with migrations as the forcing function. There's a lot of surprises in migrations, so we're starting there, but eventually, this notion of "organizational validation" is a big piece of what we're driving at.)

Comment by geldedus 20 hours ago

Sure. You're free to throttle your AI speed or whatever. But don't impose that on me.

Comment by 1 day ago

Comment by diddid 1 day ago

AI can really only be as good as the data it’s trained on. It’s good at images because it’s trained on billions of them. Lines of code, probably 100s of millions, but as you combine those codes into concepts, split by language, framework, formatting etc all you loose the numbers game. It can’t tell you how to make a good enterprise app because almost nobody knows how to make a good enterprise app, just ask Oracle… ba-da-bum!

Comment by karlkloss 1 day ago

That has been said about the world in general. Guess what?

Comment by nirui 19 hours ago

> He would just spot-check the correctness of AI’s work and quickly spin up local deployments to verify it’s indeed working.

I'm not really sure how exactly he get the project done, but "spot-check" and "quickly spin up local deployments to verify" is somehow makes me somewhat unconformable.

For me, it's either unit-tests that hits at least 100% coverage, or when unit-test is inapplicable, a line-by-line letter-by-letter verification. Otherwise your "spot-check" means no shit to me.

Comment by ambicapter 1 day ago

So, we're giving up on the Singularity, then?...Good.

Comment by darylteo 1 day ago

AI: urgh, sick of these escort missions

Comment by wasmainiac 1 day ago

It’s called TDD, ya write a bunch a little tests to make sure your code is doing what it needs to do and not what it’s not. In short, little blocks of easily verifiable code to verify your code.

But seriously, what is this article even? It feels like we are reinventing the wheel or maybe just humble AI hype?

Comment by pxc 21 hours ago

> AI always thinks and learns faster than us, this is undeniable now.

Huh? The LLMs we're using today don't learn at all. I don't even mean that in a philosophical sense— I mean they come "pre-baked" with whatever "knowledge" they have, and that's it.

Comment by awesome_dude 1 day ago

It's like a buffered queue, if the producer (AI) is too fast for the consumer (dev's brain) then the producer needs to block/stop/slow down other wise data will be lost (in this analogy the data loss is the consumer no longer having a clear understanding of what the code is doing)

One day, when AI becomes reliable (which is still a while off because AI doesn't yet understand what it's doing) then the AI will replace the consumer (IMO).

FTR - AI is still at the "text matches another pattern of text" stage, and not the "understand what concepts are being conveyed" stage, as demonstrated by AI's failure to do basic arithmetic

Comment by Sabr0 1 day ago

Ai now is becoming hard to keep up with. We gotta make sure to integrate in our daily lives to not fall behind. I literally began to make it a source of income. Make sure to do the same.

Comment by bamboozled 22 hours ago

I'm starting to come to the realization that unless there is a bottom to the amount of work people want done, it doesn't really matter about AI or not, there just seems to be a never ending supply of work so yeah, not sure how AI would resolve this.

Comment by gaigalas 1 day ago

Prompt engineering: just basic articulation skills.

Context engineering: just basic organization skills.

Verification engineering: just basic quality assurance skills.

And so on...

---

"Eric" will never be able to fully use AI for development because he lacks knowledge about even the most basic aspects of the developer's job. He's a PM after all.

I understand that the idea of turning everyone into instant developers is super attractive. However, you can't cheat learning. If you give an edge to non-developers for development tasks, it means you will give an even sharper edge to actual developers.

Comment by booleandilemma 1 day ago

This is true. I've been anti-ai but I started using it recently as an alternative to stack overflow (because google is shoving it down my mouth via search results). It's pretty effective. It does get things wrong from time to time, but then I just fix it up manually. I can't claim it's making me 100x more productive or anything like that. It's just a nice alternative to scrolling through SO answers and looking for the one with the green checkmark.

I still find it sad when people use it for prose though.

Comment by gaigalas 1 day ago

If an agent gets things wrong you should stop it and correct it instead.

Sometimes the correction will cost more than starting from scratch. In those cases, you start from scratch.

You do things manually only when novel work is required (the model is unlikely to be trained with the knowledge). The more novel the thing you're doing, the more manual things you have to do.

Identifying "cost of refactoring", and "is this novel?" are also developer skills, so, no formula here. You have to know.

Comment by CGMthrowaway 1 day ago

> AI should only run as fast as we can catch up

Good principle. This is exactly why we research vaccines and bioweapons side by side in the labs, for example.

Comment by rogerkirkness 1 day ago

Appealing, but this is coming from someone smart/thoughtful. No offence to 'rest of world', but I think that most people have felt this way for years. And realistically in a year, there won't be any people who can keep up.

Comment by dontlikeyoueith 1 day ago

> And realistically in a year, there won't be any people who can keep up.

I've heard the same claim every year since GPT-3.

It's still just as irrational as it was then.

Comment by adventured 1 day ago

You're rather dramatically demonstrating how remarkable the progress has been: GPT-3 was horrible at coding. Claude Opus 4.5 is good at it.

They're already far faster than anybody on HN could ever be. Whether it takes another five years or ten, in that span of time nobody on HN will be able to keep up with the top tier models. It's not irrational, it's guaranteed. The progress has been extraordinary and obvious, the direction is certain, the outcome is certain. All that is left is to debate whether it's a couple of years or closer to a decade.

Comment by dontlikeyoueith 1 day ago

And there's the same empty headed certainty, extrapolating a sigmoid into an exponential.

Comment by rogerkirkness 1 day ago

I can tell you don't control any resources relating to AI from your contempt alone

Comment by dontlikeyoueith 14 hours ago

You're entitled to be wrong.

Comment by Arainach 1 day ago

People claimed GPT-3 was great at coding when it launched. Those who said otherwise were dismissed. That has continued to be the case in every generation.

Comment by esafak 1 day ago

Are you saying the current models are not good at coding? That is a strong claim.

Comment by Arainach 1 day ago

For brand new projects? Perhaps. For working with existing projects in large code bases? Still not living up to the hype. Still sick of explaining to leadership that they're not magic and "agentic" isn't magic either. Still sick of everyone not realizing that if you made coding 300% faster (which AI hasn't) that doesn't help when coding is less than half the hours of my week. Still sick of the "productivity gains" being subsidized by burning out competent code reviewers calling bullshit on things that don't work or will cause problems down the road.

Comment by dwaltrip 1 day ago

A bit reductive.

Comment by stale2002 1 day ago

> People claimed GPT-3 was great at coding when it launched.

Ok and they were wrong, but now people are right that it is great at coding.

> That has continued to be the case in every generation.

If something gets better over time, it is definitionally true that it was bad for every case in the past until it becomes good. But then it is good.

Thats how that works. For everything. You are talking in tautologies while not understanding the implication of your arguments and how it applies to very general things like "A thing that improves over time".

Comment by umanwizard 1 day ago

Why is the outcome certain? We have no way of predicting how long models will continue getting better before they plateau.

Comment by adventured 1 day ago

They continue to improve significantly year over year. There's no reason to think we're near a plateau in this specific regard.

The bottom 50% of software jobs in the US are worth somewhere around $200-$300 billion per year (salary + benefits + recruiting + training/education), one trillion dollars every five years minimum. That's the opportunity. It's beyond gigantic. They will keep pursuing the elimination of those jobs until it's done. It won't take long from where we're at now, it's a 3-10 year debate, rather than a 10-20 year debate. And that's just the bottom 50%, the next quarter group above that will also be eliminated over time.

$115k + $8-12k healthcare + stock + routine operating costs + training + recruitment. That's the ballpark median two years ago. Surveys vary, from BLS to industry, two to four million software developers, software engineers, so on and so forth. Now eliminate most of them.

Your AI coding agent circa 2030 will work 24/7. It has a superior context to human developers. It never becomes emotional or angry or crazy. It never complains about being tired. It never quits due to working conditions. It never unionizes. It never leaves work. It never gets cancer or heart disease. It's not obese, it doesn't have diabetes. It doesn't need work perks. It doesn't need time off for vacations. It doesn't need bathrooms. It doesn't need to fit in or socialize. It has no cultural match concerns. It doesn't have children. It doesn't have a mortgage. It doesn't hate its bosses. It doesn't need to commute. It gets better over time. It only exists to work. It is the ultimate coding monkey. Goodbye human.

Comment by throw234234234 1 day ago

Amazing how much investment has mostly gone to eliminate one job category; ironically what was meant to be the job of the future "learn to code". To be honest on current trajectory I'm always amazed how many SWE's think it is "enabling" or will be anything else other than this in the long term. I personally don't recommend anyone into this field anymore, especially when big money sees this as the next disruption to invest in and has bet in the opposite direction investment/market wise. Amazing what was just a chatbot 3 years ago will do to a large amount of people w.r.t unemployment and potential poverty; didn't appreciate it at the time.

Life/fate does have a sense of irony it seems. I wouldn't be surprised if it is just the "creative" industries that die; and normal jobs that provide little value today still survive in some form - they weren't judged on value delivered and still existed after all.

Comment by korianders 21 hours ago

>Your AI coding agent circa 2030 will work 24/7

Doing what? What would we need software for when we have sufficiently good AI? AI would become "The Final Software", just give it input data, tell it what of data transform you want and it will give you the output, no need for new software ever again.

Comment by airstrike 1 day ago

> And realistically in a year, there won't be any people who can keep up.

Bold claim. They said the same thing at the start of this year.

Comment by adventured 1 day ago

You're all arguing over how many single digit years it'll take at this point.

It doesn't matter if it takes another 12 or 36 months to make that claim true. It doesn't matter if it takes five years.

Is AI coming for most of the software jobs? Yes it is. It's moving very quickly, and nothing can stop it. The progress has been particularly exceptionally clear (early GPT to Gemini 3 / Opus 4.5 / Codex).

Comment by bdangubic 1 day ago

> Is AI coming for most of the software jobs?

be cool to start with one before we move to most…

Comment by esafak 1 day ago

https://news.ycombinator.com/item?id=46124063

Comment by yuedongze 1 day ago

im hoping this can introduce a framework to help people visualize the problem and figure out a way to close that gap. image generation is something every one can verify, but code generation is perhaps not. but if we can make verifying code as effortless as verifying images (not saying it's possible), then our productivity can enter the next level...

Comment by drlobster 1 day ago

I think you underestimating how good these image generators are at the moment.

Comment by yuedongze 1 day ago

oh i mean the other direction! checking if a generated image is "good" that no one will tell something is off and it look naturally, rather than checking if they are fake.

Comment by cons0le 1 day ago

I directly asked gemini how to get world peace. It said the world should prioritize addressing climate change, inequality, and discrimination. Yeah - we're not gonna do any of that shit. So I don't know what the point of "superintelligent" AI is if we aren't going to even listen to it for the basic big picture stuff. Any sort of "utopia" that people imagine AI bringing is doomed to fail because we already can't cooperate without AI

Comment by ASalazarMX 1 day ago

> I don't know what the point of "super intelligent" AI is if we aren't going to even listen to it

Because you asked the wrong question. The most likely question would be "How do I make a quadrillion dollars and humiliate my super rich peers?".

But realistically, it gave you an answer according to its capacity. A real super intelligent AI, and I mean oh-god-we-are-but-insects-in-its-shadow super intelligence, would give you a roadmap and blueprint, and it would take account for our deep-rooted human flaws, so no one reading it seriously could dismiss it as superficial. in fact, anyone world elite reading it would see it as a chance to humiliate their world elite peers and get all the glory for themselves.

You know how adults can fool little children to do what they don't want to? We would be the toddlers in that scenario. I hope this hypothetical AI has humans in high regard, because that would be the only thing saving us from ourselves.

Comment by vkou 1 day ago

The blueprint should start with a recipe for building a better computer, and once you do that, well, it's humans starting fires and playing with the flames.

Comment by catigula 1 day ago

Why would a "real super intelligent AI" be your servant in this scenario?

>I hope this hypothetical AI has humans in high regard

This is invented. This is a human concept, rooted in your evolutionary relationships with other humans.

It's not your fault, it's very difficult or impossible to escape the simulation of human-ly modelling intelligence. You need only understand that all of your models are category errors.

Comment by ASalazarMX 1 day ago

> Why would a "real super intelligent AI" be your servant in this scenario?

Why is the Bagger 288 a servant to miners, given the unimaginable difference in their strenght? Because engineers made it. Give humanity's wellbeing the highest weight on its training, and hope it carries over when they start training on their own.

Comment by catigula 1 day ago

Category error. Intelligence is a different type of thing. It is not a boring technology.

>Give humanity's wellbeing the highest weight on its training

We don't even know how to do this relatively trivial thing. We only know how to roughly train for some signals that probably aren't correct.

This may surprise you but alignment is not merely unsolved; there are many people who think it's unsolvable.

Why do people eat artificially sweetened things? Why do people use birth control? Why do people watch pornography? Why do people do drugs? Why do people play video games? Why do people watch moving lights and pictures? These are all symptoms of humans being misaligned.

Natural selection would be very angry with us if it knew we didn't care about what it wanted.

Comment by ASalazarMX 1 day ago

> Why do people eat artificially sweetened things? Why do people use birth control? Why do people watch pornography? Why do people do drugs? Why do people play video games? Why do people watch moving lights and pictures? These are all symptoms of humans being misaligned.

I think these behaviors are fully aligned with natural selection. Why do we overengineer our food? It's not for health, because simpler food would satisfy our nutritional needs as easily, it's because our far ancestors developed a taste for food that kept them alive longer. Our incredibly complex chain of meal preparation is just us looking to satisfy that desire for tasty food by overloading it as much as possible.

People prefer artificial sweeteners because they taste sweeter than regular ones, they use birth control because we inherently enjoy sex and want more of it (but not more raising babies), drugs are an overloading of our need for hapiness, etc. Our bodies crave for things, and uninformed, we give them what they want but multiplied several fold.

But geez, I agree, alignment of AI is a hard problem, but it would be wrong to say it's impossible, at least until it's understood better.

Comment by catigula 1 day ago

It seems like you don’t understand reinforcement learning. The signal is reinforced because it correlates to behavior, hacking the signal itself is misalignment.

Comment by Nzen 1 day ago

Did you expect some answer that decried world peace as impossible ? It's just repeating what people say [0] when asked the same question. That's all that a large language model can do (other than putting it to rhyme or 'in the style of Charles Dickens').

[0] https://newint.org/features/2018/09/18/10-steps-world-peace

If you are looking for a vision of general AI that confirms a Hobbsian worldview, you might enjoy Lars Doucet's short story, _Four Magic Words_.

[1] https://www.fortressofdoors.com/four-magic-words/

Comment by chasd00 1 day ago

> So I don't know what the point of "superintelligent" AI is if we aren't going to even listen to it

I would kind of feel sorry for a super-intelligent AI having to deal with humans who have their fingers on on/off switch. It would be a very frustrating existence.

Comment by PunchyHamster 1 day ago

I dunno, many people have that weird, unfounded trust in what AI says, more than in actual human experts it seems

Comment by bilbo0s 1 day ago

Because AI, or rather, an LLM, is the consensus of many human experts as encoded in its embedding. So it is better, but only for those who are already expert in what they're asking.

The problem is, you have to know enough about the subject on which you're asking a question to land in the right place in the embedding. If you don't, you'll just get bunk. (I know it's popular to call AI bunk "hallucinations" these days, but really if it was being spouted by a half wit human we'd just call it "bunk".)

So you really have to be an expert in order to maximize your use of an LLM. And even then, you'll only be able to maximize your use of that LLM in the field in which your expertise lies.

A programmer, for instance, will likely never be able to ask a coherent enough question about economics or oncology for an LLM to give a reliable answer. Similarly, an oncologist will never be able to give a coherent enough software specification for an LLM to write an application for him or her.

That's the achilles heel of AI today as implemented by LLMs.

Comment by chasd00 1 day ago

> The problem is, you have to know enough about the subject on which you're asking a question to land in the right place in the embedding

The other day i was on a call with 3 or 4 other people solving a config problem in a specific system. One of them asked chatgpt for the solution and got back a list of configuration steps to follow. He started the steps but one of them mentioned configuring an option that did not exist in the system at all. Textbook hallucination. It was obvious on the call that he was very surprised that the AI would give him an incorrect result, he was 100% convinced the answer was what the LLM said and never once thought to question what the LLM returned.

I've had a couple of instances with friends being equally shocked when an LLM turned out to be wrong. One of which was fairly disturbing, I was at a horse track and describing LLMs and to demonstrate i took a picture of the racing form thing and asked the LLM to formulate a medium risk betting strategy. My friend immediatately took it as some kind of supernatural insight and bet $100 on the plan it came up with. It was as if he believed the LLM could tell the future.Thank god it didn't work and he lost about $70. Had he won I don't know what would have happened, he probably would have asked again and bet everything he had.

Comment by jackblemming 1 day ago

> is the consensus of many human experts as encoded in its embedding

That’s not true.

Comment by ASalazarMX 1 day ago

Yup, current LLMs are trained on the best and the worst we can offer. I think there's value in training smaller models with strictly curated datasets, to guarantee they've learned from trustworthy sources.

Comment by chasd00 1 day ago

> to guarantee they've learned from trustworthy sources.

i don't see how this will every work. Even in hard science there's debate over what content is trustworthy and what is not. Imagine trying to declare your source of training material on religion, philosophy, or politics "trustworthy".

Comment by ASalazarMX 1 day ago

"Sir, I want an LLM to design architecture, not to debate philosophy."

But really, you leave the curation to real humans, institutions with ethical procedures already in place. I don't want Goole or Elon dictating what truth is, but I wouldn't mind if NASA or other aerospace institutions dictated what is truth in that space.

Of course, the dataset should have a list of every document/source used, so others can audit it. I know, unthinkable in this corporate world, but one can dream.

Comment by cranium 1 day ago

"How to be in good health? Sleep, eat well, exercise." However, knowledge ≠ application.

Comment by potsandpans 1 day ago

I don't believe that this is going to happen, but the primary arguments revolving around a "super intelligent" ai involve removing the need for us to listen to it.

A super intelligent ai would have agency, and when incentives are not aligned would be adversarial.

In the caricature scenario, we'd ask, "super ai, how to achieve world peace?" It would answer the same way, but then solve it in a non-human centric approach: reducing humanities autonomy over the world.

Fixed: anthropogenic climate change resolved, inequality and discrimination reduced (by reducing population by 90%, and putting the rest in virtual reality)

Comment by ASalazarMX 1 day ago

If out AIs achieve something like this, but they managed to give them the same values the minds in Iain Bank's Culture Series had, I think humanity would be golden.

Comment by pessimizer 1 day ago

> Any sort of "utopia" that people imagine AI bringing is doomed to fail because we already can't cooperate without AI

It's just fanfiction. They're just making up stories in their heads based on blending sci-fi they've read or watched in the past. There's no theory of power, there's no understanding of history or even the present, it's just a bad Star Trek episode.

"Intelligence" itself isn't even a precise concept. The idea that a "superintelligent" AI is intrinsically going to be obsessed with juvenile power fantasies is just silly. An AI doesn't want to enslave the world, run dictatorial experiments born of childhood frustrations and get all the girls. It doesn't want anything. It's purposeless. Its intelligence won't even be recognized as intelligence if its suggestions aren't pleasing to the powerful. They'll keep tweaking it to keep it precisely as dumb as they themselves are.

Comment by kaluga 23 hours ago

[dead]