AI's real superpower: consuming, not creating
Posted by firefoxd 4 hours ago
Comments
Comment by bdbdbdb 14 minutes ago
And therefore it's impossible to test the accuracy if it's consuming your own data. AI can hallucinate on any data you feed it, and it's been proven that it doesn't summarize, but rather abridges and abbreviates data.
In the authors example
> "What patterns emerge from my last 50 one-on-ones?" AI found that performance issues always preceded tool complaints by 2-3 weeks. I'd never connected those dots.
Maybe that's a pattern from 50 one-on-ones. Or maybe it's only in the first two and the last one.
I'd be wary of using AI to summarize like this and expecting accurate insights
Comment by gchamonlive 4 minutes ago
Do you have more resources on that? I'd love to read about the methodology.
> And therefore it's impossible to test the accuracy if it's consuming your own data.
Isn't it only if it's hard to verify the result? If it's a result that's hard to produce but easy to verify, a class which many problems fall into, you'd just need to look at the synthetized results.
If you ask it "given these arbitrary metrics, what is the best business plan for my company?" It'd be really hard to verify the result. I'd be hard to verify the result from anyone for that matter, even specialists.
So I think it's less about expecting the LLM to do autonomous work and more about using LLMs to more efficiently help you search the latent space for interesting correlations, so that you and not the LLM come up with the insights.
Comment by block_dagger 3 minutes ago
Comment by purplehat_ 1 hour ago
How are you guys dealing with this risk? I'm sure on this site nobody is naive to the potential harms of tech, but if you're able to articulate how you've figured out that the risk is worth the benefits to you I'd love to hear it. I don't think I'm being to cynical to wait for either local LLMs to get good or for me to be able to afford expensive GPUs for current local LLMs, but maybe I should be time-discounting a bit harder?
I'm happy to elaborate on why I find it dangerous, too, if this is too vague. Just really would like to have a more nuanced opinion here.
Comment by ben_w 46 minutes ago
This does mean that, useful as e.g. Claude Code is, for any business with NDA-type obligations, I don't think I could recommend it over a locally hosted model, even though the machine needed to run a decent local model might cost €10k (with current price increases due to demand exceeding supply), that the machine is still slower than what hosts the hosted models, that the rapid rate of improvement means a 3-month delay between SOTA in open-weights and private-weights is enough to matter*.
But until then? If I'm vibe coding a video game I'd give away for free anyway, or copy-editing a blog post that's public anyway, or using it to help with some short stories that I'd never be able to charge money for, or uploading pictures of the plants in my garden right by the public road… that's fine.
* When the music (money for training) stops, it could be just about any provider whose model is best, whatever that is is likely to still get distilled down fairly cheaply and/or some 3-month-old open-weights model is likely to get fine-tuned for each task fairly cheaply; independently of this, without the hyper-scalers the supply chains may shift back from DCs to PCs and make local models much more affordable.
Comment by empiko 25 minutes ago
Comment by embedding-shape 17 minutes ago
In general for chat platforms you're right though, uploading/copy-pasting long documents and asking the LLM to find not one, but multiple needles in a haystack tend to give you really poor results. You need a workflow/process for getting accuracy for those sort of tasks.
Comment by skydhash 1 hour ago
I take notes for remembrance and relevance (what is interesting for me). But linking concepts is all my thinking. Doing whatever rhe article is prescribing is like sending someone on a tourist trip to take pictures and then bragging that you visited the country. While knowing that some pictures are photoshopped.
Comment by bzmrgonz 57 minutes ago
Comment by kitd 58 minutes ago
Still, all credit to him for creating that asset in the first place.
Comment by impendia 1 hour ago
The AI summary at the top was surprisingly good! Of course, the AI isn't doing anything original; instead, it created a summary of whatever written material is already out there. Which is exactly what I wanted.
Comment by Arisaka1 1 hour ago
This isn't strictly a case against AI, just a case that we have a contradiction on the definition of "well informed". We value over-consumption, to the point where we see learning 3 things in 5 minutes as better than learning 1 thing in 5 minutes, even if that means being fully unable to defend or counterpoint what we just read.
I'm speficially referring to what you said: "the speaker used some obscure technical terminology I didn't know" this is due to lack of assumed background knowledge, which makes it hard to verify a summary on your own.
Comment by lazide 1 hour ago
The 3 things in 5 minutes is even worse - it’s like taking Google Maps everywhere without even thinking about how to get from point A to point B - the odds of knowing anything at all from that are near zero.
And since it summarizes the original content, it’s an even bigger issue - we never even have contact with the thing we’re putatively learning from, so it’s even harder to tell bullshit from reality.
It’s like we never even drove the directions Google Maps was giving us.
We’re going to end up with a huge number of extremely disconnected and useless people, who all absolutely insist they know things and can do stuff. :s
Comment by FridayoLeary 22 minutes ago
Comment by sam_goody 48 minutes ago
I looked up a medical term, that is frequently misused (eg. "retarded"), and asked the Gemini to compare it with similar conditions.
Because I have enough of a background in the subject matter, I could tell what it had construed by its mixing the many incorrect references with the much fewer correct references in the training data.
I asked it for sources, and it failed to provide anything useful. But once I am looking at sources, I would be MUCH better off searching and only reading the sources might actually be useful.
I was sitting with a medical professional at the time (who is not also a programmer) and he completely swallowed what Gemini was feeding him. He commented that he appreciates that these summaries let him know when he is not up to date with the latest advances, and he learnt alot from the response.
As an aside, I am not sure I appreciate that Google's profile would now associate me with that particular condition.
Scary!
Comment by solumunus 38 minutes ago
Comment by ozgung 9 minutes ago
Comment by neom 1 hour ago
Comment by nnnnico 52 minutes ago
Comment by mettamage 1 hour ago
I've written my whole lifestory, the parts I'm willing to share that is, and posted it in Claude. It helped me way better with all kinds of things. It took me 2 days to write without formatting, pretty much how I write all my HN comments (but then 2 days straight: eat, sleep, write).
I've also exported all my notes, but it's too big for the context. That's why I wrote my life story.
From a practical standpoint I think the focus is on context management. Obsidian can help with this (I haven't used it so don't know the details). For code, it means doing things like static and dynamic analysis to see which functions calls what and create a topology of function calls and send that as context, then Claude Code can more easily know what to edit, and it doesn't need to read all the code.
Comment by TN1ck 1 hour ago
Comment by mettamage 33 minutes ago
So yea, I definitely, add to the "AI generated" text part but I read over all the texts, and usually they don't get sent out. Ultimately, it's still a lot quicker to do it this way.
For career planning, so far it hasn't beaten my own insights but it came close. For example, it mentioned that I should actually be a developer advocate instead of a software engineer. 2 to 3 years ago I came to that same thought. I ultimately rejected the idea due to how I am but it is a good one to think about.
What I see now, I think the best job for me would be a tech consultant. Or as I'd also like to call it: a data analyst that spots problems and then uses his software engineering or teaching skills to solve that problem. I don't think that job has a good catch all title as it is a pretty generalist job. I'm currently at a company that allows me to do this but the pay is quite low, so I'm looking for a tech company where I could do something similar. Maybe a product manager role? It really depends on the company culture.
What I also noticed it did better: it doesn't reduce me to data engineering anymore. It understands that I aspire to learn everything and anything I can get my hands on. It's my mode of living and Claude understands that.
So nothing too spectacular yet, but it'll come. It requires more prompt/context engineering and fine tuning of certain things. I didn't get around to it yet.
Comment by sallveburrpi 1 hour ago
I would love to try this out but don’t feel comfortable sharing all my personal notes with a third party.
Comment by adidoit 48 minutes ago
Comment by tigranbs 1 hour ago
Comment by bzmrgonz 1 hour ago
Comment by ratdragon 1 hour ago
Comment by pqs 1 hour ago
Comment by monkeydust 1 hour ago
Comment by heliumtera 21 minutes ago
Comment by embedding-shape 19 minutes ago
Comment by ramigb 57 minutes ago
Comment by fuzzfactor 29 minutes ago
Well for most humans that's the more super of the powers too ;)
Comment by tomgag 1 hour ago
Comment by ramigb 59 minutes ago