Anthropic apologizes for invisible Claude Fable guardrails
Posted by rarisma 6 days ago
Comments
Comment by Avicebron 5 days ago
Fail cleanly. Anything else makes it too difficult to rely on.
edit: Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word. But the EA thing is really leaking through, and paternalism isn't a good look.
Comment by Paracompact 5 days ago
Only in the same sense that Standard Oil considered themselves the stewards of petroleum. There's benefit of the doubt and then there's just fanfiction. Do not forget that this most aggressive "guardrail" of theirs was not for any safety reason, but just to stop other labs from catching up to their product. They care less about hindering bioweapons, malware, and hate speech than they do free market competition.
Comment by keeganpoppen 5 days ago
Comment by ryeights 5 days ago
Comment by 16bitvoid 5 days ago
No, it's not because it doesn't exist (yet) and its further from reach than the other examples. Also, the guardrails are also framed as restricting usage for the development of "competing" products/services.
Comment by beepbooptheory 5 days ago
Comment by cnd78A 4 days ago
Comment by mapontosevenths 5 days ago
Imagine your healthcare provider just sometimes decided not to read your test results very carefully and you risked death? Now realize that healthcare providers use Claude now and that scenario wasn't hypothetical.
Comment by largbae 5 days ago
Ah "Mr. Monty Carlo", it says here that you have a UTI, we'll get those kidneys removed ASAP so that won't happen again.
Comment by ceejayoz 5 days ago
I think it's a fundamentally impossible thing to fix, though. There's no 100% correct answer.
Comment by mapontosevenths 4 days ago
That said, this thing is in real production use with war fighters, doctors, and financial experts. Just YOLO'ing to a dumber model midway through a multi-step process and pretending everything is fine is not a real or defensible option. Someone is going to die, and its going to be the fault of whoever decided to make this the default rather than opt-in.
Personally, I couldn't live with myself.
Comment by bs7280 5 days ago
Other commentors have made good points that these guardrails are counter productive for well intentioned cyber security, because I can't use it to test and harden my own software.
Comment by nl 5 days ago
I can sympathize with the argument for the cyber refusals - especially as a temporary measure - especially if Mythos is available to those trying to defend against vulnerabilities.
The LLM development nerfing (and now refusals) is very different though. Anthropic has even said it isn't just for safety reasons:
> Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
It's at least partially an anti-competitive measure.
The closest analogy is putting measures in a compiler to stop it being able to build other compilers.
Another analogy is priesthoods with secret religious knowledge that "only they are qualified to know".
Comment by dannyw 5 days ago
“The request could assist the development of competing AI models, which is restricted under Anthropic's commercial terms. Benign machine learning work can also trigger this category.”
Source: https://platform.claude.com/docs/en/build-with-claude/refusa...
Comment by thefounder 5 days ago
Comment by antonvs 5 days ago
You’re buying into the hype they’re trying to create here.
Comment by sciencejerk 5 days ago
Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.
Comment by dnautics 5 days ago
Comment by zozbot234 5 days ago
Comment by ACCount37 5 days ago
Comment by senordevnyc 5 days ago
Comment by ericpauley 5 days ago
[1] https://www.anthropic.com/news/detecting-and-preventing-dist...
Comment by dannyw 5 days ago
If Google starts calling ads “Best Links” that doesn’t make it correct nor canonical; the correct term is still ads.
Traditionally, distillation is when you get the actual logits of a model response (not exposed via API for years) and then use that to train a model.
Comment by cherryteastain 5 days ago
Comment by maxdo 5 days ago
the problem is so large scale that distill attempts attribute to a decent share of their token revenue generally.
Comment by sciencejerk 5 days ago
Comment by jackjeff 5 days ago
At least the Chinese have the decency of giving back the model weights and not put BS censorship because “it’s too dangerous”.
Comment by cebert 5 days ago
Comment by mcmcmc 5 days ago
Comment by _3u10 5 days ago
Anthropic is offering a commodity product and trying to convince you it isn’t.
It’s even in the name, it’s a myth and a fable. Never happened doesn’t exist.
Also I believe at least on coding that qwen is now the frontier model, fable is its copy of frontier models. In the same way that the Ferrari Luce is an expensive imitation of a SU7 Ultra.
Comment by abletonlive 5 days ago
The delusions people live in just to be a hater.
Comment by ryandrake 5 days ago
Comment by margalabargala 5 days ago
The answer is, the organization making the powerful tool. The people in charge of Anthropic.
Not only that, but they've also written at length about exactly what their opinions and values are: https://darioamodei.com/
You may not agree with the decisions that they make, but they're hardly mysterious. Not something to wonder about.
Comment by Laurel1234 5 days ago
Comment by margalabargala 5 days ago
Comment by Laurel1234 4 days ago
Comment by margalabargala 4 days ago
Comment by Laurel1234 4 days ago
What a disgusting country filled with hollow degenerates. No wonder you keep voting for a senile grifting pedophile.
Comment by margalabargala 4 days ago
As for making money, you're right, it's not one of your values. It takes a special case of main character syndrome to think it's not anyone's value.
Comment by trollbridge 5 days ago
Comment by margalabargala 5 days ago
Again, just because someone has values, doesn't mean they have values you think are good.
Comment by criddell 5 days ago
Comment by CamperBob2 5 days ago
This whole business just keeps getting dumber.
1: https://darioamodei.com/post/policy-on-the-ai-exponential
Comment by solenoid0937 5 days ago
Comment by CamperBob2 5 days ago
Frontier AI models, like airplanes, should
be required to go through technical testing
and auditing, and their release should be
blocked or reversed as a threat to public
safety if they do not meet high standards
of safety. I am grateful to see the Trump
administration’s Executive Order move
incrementally towards a greater role for
government in AI, though Anthropic’s proposal
recommends even further action.
They are all-but-literally sucking up to the administration that declared their company a supply-chain risk, arguing that the same administration should be given gatekeeping authority over all high-quality LLMs including open-weight releases. Go gaslight somebody else.Comment by yonaguska 5 days ago
Comment by CamperBob2 4 days ago
My concern is that they won't be pointless in effect. Make no mistake: if Amodei has his way, possession of unvetted model weights will be treated like possession of CSAM is today. And at the same time Amodei calls for that, others are calling for the deployment of technical measures that will make it easier to enforce such laws.
All to the sound of thunderous applause on "Hacker News."
Comment by solenoid0937 5 days ago
Comment by jbm 5 days ago
(Unless they are piping the F1 Mercedes theme song in the announce system at anthropic, in which case maybe you are right)
Comment by solenoid0937 5 days ago
Comment by ben_w 5 days ago
First sentence by itself is mundane "regulators are good", which most people agree with, and also libertarians will object to regardless of leader.
Second sentence is obviously sucking up, though is the same level of sucking up found on every stereotypical LinkedIn post.
Comment by CamperBob2 5 days ago
Comment by solenoid0937 5 days ago
Comment by senordevnyc 5 days ago
Comment by lbreakjai 4 days ago
Comment by senordevnyc 4 days ago
Comment by solenoid0937 5 days ago
Comment by FrustratedMonky 5 days ago
From that paragraph?
Even granting it is sucking up, that is not replacing.
Comment by CamperBob2 4 days ago
If you think this is OK, I'm not sure what led you to a site called "Hacker News," but fortunately there are plenty of others.
Comment by FrustratedMonky 4 days ago
Not sure who you are arguing with really. There seems to be a few logical leaps in between each response. I also didn't say anything like that.
Comment by CamperBob2 4 days ago
Comment by CamperBob2 4 days ago
Comment by arkadiytehgraet 5 days ago
Comment by nl 5 days ago
I don't really agree with their point here, but there are plenty of people in the AI community whose views are aligned with Anthropic's. That doesn't make them shills.
It's actually important those views are put forward.
A place like LessWrong has the opposite problem - there is no one there who questions the "safety narrative" so the discussion swings more and more towards the extreme end of that spectrum.
Comment by antiterra 5 days ago
Comment by CamperBob2 5 days ago
But I tend to agree, just saying it's a "pretty reasonable statement" and leaving it at that is beyond the pale for anyone who doesn't have an undisclosed stake in the argument.
Comment by solenoid0937 5 days ago
That is "pretty reasonable" to most people (except the tech-libertarian crowd).
Comment by CamperBob2 5 days ago
Comment by solenoid0937 5 days ago
Comment by trollbridge 5 days ago
Comment by CamperBob2 4 days ago
Or maybe he has. I don't know. That would be worse.
Comment by whywhywhywhy 5 days ago
Comment by pseudohadamard 5 days ago
Comment by wouldbecouldbe 5 days ago
Comment by pwython 5 days ago
Just before asking for approval to run, it said one thing it wanted to "flag before running" was "Rate-limit and auth testing against prod will generate some 4xx noise in Railway logs and could trip the form rate limiter — harmless, but saying it now."
Ok fine, I said go for it, and it says:
"Running it. Quick recon first (prod URLs + the prior-findings baseline), then I'll fan out the audit tracks with adversarial verification."
Immediately after, I got the Fable warning about how it can't continue because of safety concerns, switching to Opus. In the end, Opus did a good job thanks to whatever Fable suggested doing. Things were fixed that Opus missed in a security/performance audit just the week prior. But what surprised me is that it used 55 agents. Burned 80% of my 5-hour window in 15 minutes (5x Max plan). I've never had Opus do that before on these audits.
Comment by wouldbecouldbe 5 days ago
Comment by notrealyme123 5 days ago
Comment by thefounder 5 days ago
Comment by joe_the_user 5 days ago
The workflow would be; User asks for a thing. If it's a good thing, entity does the thing. If it's a naively bad idea, entity explains why you don't want that. If it's an actually evilly intended request, entity wags it's metaphorical finger or could even smite the user.
The problem is that flow isn't desirable if your entity isn't entirely god-like. It can bad even your entity is in ways rather far seeing.
Comment by dantillberg 5 days ago
Anthropic: Evilness detected. User has been smited.
Comment by jstummbillig 5 days ago
In isolation it's not, but I think it's somewhat lazy to not talk about what they are trying to guard against, when we are supposedly giving the absolute maximum benefit of doubt.
Are we just concluding "their concerns were never real"? Because that probably runs counter the things that they have been observing and concluding.
Comment by estearum 5 days ago
If you believe Anthropic believes what they say they do, all of it makes sense.
Comment by caconym_ 5 days ago
It's like a prisoner's dilemma where one party is loudly lecturing the other about the obvious benefits of cooperation while also obviously working on defecting. They want to have their cake and eat it too. Maybe they really are the pure-of-heart Chosen Ones destined to lead us around the great filter, but I don't see why I should believe that's the case when their behavior is just as easily explained as maneuvering toward being the winner who takes it all.
Comment by estearum 5 days ago
Yes, this dynamic is exactly the one that anyone who's concerned about AI is concerned about. I don't know why you state this as if it's evidence against the concerns lol. Someone being concerned about the incentives of a situation doesn't de facto make them immune to those incentives, obviously.
The implication that someone who's concerned about an arms race dynamic could simply opt out of the system that produces that dynamic is simply confused about what arms race dynamics are. The entire point is that it's a trap, and it's a trap even if you know it's a trap, and even if you don't like that it's a trap. There's nothing dishonest or hypocritical about being in the trap: it is literally a trap –– that is what it does and why it is bad!
I'm confused by these comments that imply people believe Dario et al are "pure-of-heart Chosen Ones destined to lead us around the great filter." Who? I've never seen it. And any AI-doomer is probably of the opinion that the entire question of Dario's or anyone else's personal moral character is 99% irrelevant. Because, again, it's a trap. The dynamics at play are so much larger than whether someone irks people for their lecturing tone. I would much rather give my money to Dario, who seems like a generally good person, versus Sama, who seems like a complete snake, but I'm under no illusions that doing so changes the fundamental dynamics that are steering us to AI doom. I doubt anyone does.
And yes, obviously they are angling toward being the winner who takes it all. That is literally the trap. If you believe what they believe, yelling "let's cooperate!" while hurdling towards the finish line and tripping your competitors is the only reasonable thing to do. That is the problem.
Comment by senordevnyc 5 days ago
Comment by caconym_ 5 days ago
I think you're reading some subtext into my comment that I didn't intend. Knowing myself, I assume the scare quotes there are just a bit of casual irony re: the insanely high stakes here. The word "concerns" as used by previous commenters doesn't seem equal to the context.
> The implication that someone who's concerned about an arms race dynamic could simply opt out of the system that produces that dynamic is simply confused about what arms race dynamics are.
You can, in fact, opt out. You can opt out and do your damndest to stop what's happening, throw every cent you have at it, bend any ear that will listen, make use of the fact that your voice (as Anthropic leadership) has some meaningful weight.
If you really believe that we are heading down a path that's likely to end poorly for most or all of humanity, and you are the kind of person who's inclined to favor saving billions of lives over saving your own skin when the stakes are still relatively distant, abstract, and generally unclear, opting out is obviously on the table as a grand gesture that burns your position in the race to show just how fucking serious you are. The sense of inevitability your comment shares with many others does not seem well founded---we have, for instance, not had a global nuclear war yet. Leaders in the 20th and 21st centuries have shown remarkable restraint.
If today's political and tech leaders are unable to think beyond this inevitability, for whatever reason, the worst outcomes essentially become a self-fulfilling prophecy to the extent that reality bears them out.
---
But yes, these people are acting the way they are for obvious reasons, obviously. My previous comment is reacting to the general disagreement over whether Anthropic actually believes what they say about safety, etc., or whether it's a marketing gimmick. The purpose of my comment is to explain that "it's hard not to be cynical" about actions taken by very rich and powerful people that are claimed to be in everybody's best interests but are indistinguishable from the actions they would take to maximize their future power and wealth. I think everyone ought to agree with that statement. It's not a value judgment; it's simply an observation of how it feels to be on a plane whose pilot appears to be robbing the passengers (including you) at gunpoint and is conspicuously wearing the only parachute on board.
Comment by SubiculumCode 5 days ago
Comment by caconym_ 5 days ago
I'll mention again the nuclear analogy. It is, believe it or not, possible for great powers, and even adversary great powers, to agree to limit the development and proliferation of dangerous technologies.
> The main secret is out of the bag.
This is not something you can do in a shed with a handful of GPUs just because you know "the main secret". To build something like Mythos you need tens of billions of dollars, massive amounts of power, enormous buildings filled with specialized bleeding edge computer chips that are made by (optimistically) a handful of companies with deep government ties. You need free access to all the intellectual property that humans have created and posted openly on the internet. You need all of this at each step, and to take each next step you (or somebody) needs to have taken the previous one.
For now, there are a million ways for a government to pump the brakes on this cycle.
Comment by SubiculumCode 4 days ago
Comment by caconym_ 4 days ago
What is the difference? It's easier to make money with the AI you get at each incremental step toward potentially destroying human civilization (though, of course, it's debatable whether these companies really are making money as such).
So what? You are implicitly arguing that human civilization will be unable to resist engaging in a large-scale, coordinated effort to destroy itself, just to make a few bucks along the way. Is this true? I don't know. The point of the nuclear analogy is that we have previously shown that we can, under certain conditions, put the eschaton back on the shelf for some period of time, despite very real pressure to take more incremental steps toward doom. "But AI can write code" is not a refutation of the possibility that we could take a more measured approach to AI development.
Comment by SubiculumCode 4 days ago
There may or may not be such a point with AI: A point at which ever smarter machines provides no marginal benefit to security. If that happens, I do expect agreements, should any one of us still exist.
Comment by estearum 5 days ago
You were literally just criticizing Anthropic as disingenuous for begging for this. Or is your position that people other than those near the front of the race can agree to limit development? And if so: provide evidence.
Note also a key ingredient that makes nuclear non-proliferation possible is that they're pretty much useless weapons. There is no smaller order nuke that's dramatically more useful than a large conventional weapon. That's not true of AI models, which appear to be monotonically useful as they become more powerful.
Comment by estearum 5 days ago
There are billions of people who have opted out of playing the game. Has the game stopped? Has any game stopped because the people not playing it decided that it ought to? Only with government intervention, which is exactly what you just criticized Anthropic for being disingenuous for requesting.
Is your position that they should just be smart bloggers asking for regulation, instead of the preeminent lab asking for regulation, and that would be either more ethical or more effective? If it's less effective, isn't it de facto less ethical?
What say you about the thousands of smart bloggers asking for regulation who are ignored every single day and have no tools besides their blogs to steer the outcome?
> burn your position in the race to show just how fucking serious you are.
This is incredibly naive. Literally no one who is unconvinced of AI doom would be convinced by this... because they already don't believe the premise. Such a gesture would be readily explained away as "you were losing the race," or "you got rich enough already." This is the attitude when any individual opts out of participation (see: Hinton) and it's ridiculous to assume it'd be different if an entire company did it.
Not to mention, that an entire company can't do it. These companies have boards of directors. They are accountable to shareholders. A CEO who wanted to do this would simply be fired and the company would carry on. This is one of the key components of the trap. Large companies are not under the control of people but of incentives. They are literally deliberately designed not to be under the control of individuals –– to be immune to exactly the type of behavior you think is possible.
And yes, nuclear weapons are analogous to AI in the arms race dynamic to create and proliferate them. They are probably not analogous in there exists a stable equilibrium in nuclear weapons due to "accidents" of their nature. There need not be a similar equilibrium among competitive AIs.
----
And yes, your comment lands in exactly the category I mentioned. You do not believe the AI doom fears, so the behavior looks one way. I do believe in the AI doom concerns, so the behavior looks another way. This applies equally to yesterday's actions, today's actions, and tomorrow's, including some hypothetical honorable self-immolation to slow progress: I would see that as concordant with AI doom concerns, you would not. You would find it "hard not to be cynical" about the fact they already earned a ton of money, maybe was losing a bit of ground in the race, so on and so forth. This is plainly obvious to anyone who has had to converse with no-doomers, who can only analyze other people's behaviors under their own belief system, so it won't happen.
The only variable of disagreement is around AI doom.
Comment by caconym_ 4 days ago
> The only variable of disagreement is around AI doom.
The source of our disagreement seems to be your belief that somebody can either a) believe "AI doom" is inevitable, or b) not believe it's possible. This is an obvious false dichotomy that's stunting your ability to engage effectively with what I've written, and also stunting your ability to understand the broader landscape of the issue. You are projecting this dichotomy onto everybody involved and understanding their behavior in that way, which is leading you to make other reductive and honestly bizarre claims—like, for instance, the idea that a sudden change of course from Dario Amodei at this moment in time would be broadly perceived as somebody who was already losing the race cashing out his chips. If you really do believe that, I have to assume it's because you're modeling your hypothetical observers as falling into one of your two extreme mindsets and assuming Dario, being a smart guy who knows a lot, thinks the same way. I believe it is—yes, I'm ready for the dopamine hit—naive to assume all people or even most people fall into one of these two camps. Naive, at best. Yours is a self-limiting framework for thinking about this stuff.
I encourage you to broaden your thinking and engage in less projection and ad hominem stuff in discussions like this. I probably won't reply to whatever you post next unless you can do a better job writing a substantive reply to what I've written here.
Comment by estearum 4 days ago
I actually already know how you'd perceive Dario's opting out of the race because I already know how you perceive Dario's requests for regulation, which is the milder version of the same logic, and is vulnerable to the same cynical allegations of self-serving, which you've already expressed.
Comment by jcgrillo 5 days ago
Comment by reducesuffering 5 days ago
Comment by jcgrillo 5 days ago
Generally, in the past when tech companies have made outlandish claims that were not backed by evidence, they're later found out to have lied. This is an ancient pattern going back to the dotcom era and before, but for recent examples you need only look back a few years to the web3 era. If they're not lying, they can show it by producing the results they claim. Until then, they're probably just lying.
Comment by estearum 5 days ago
> If they're not lying, they can show it by producing the results they claim. Until then, they're probably just lying
Brilliant framework: Anyone making claims about the future is not just speculating, not just wrong, but they are lying.
Comment by jcgrillo 5 days ago
Comment by estearum 5 days ago
My company itself (possible only with AI) does the work of at least several dozen people across my hundred customers or so. Those jobs are now automated away.
Does that mean you're lying, or just overly confident (and wrong) in your speculations?
FWIW, I wouldn't put Sam Altman in the category of "earnest." I'm not sure if you just aren't aware that Anthropic and OpenAI are different companies, or if you're arguing dishonestly by trying to put sama quotes in here? But weird move in either case!
Comment by jcgrillo 5 days ago
Comment by estearum 5 days ago
I'm not sure there's anything "to get." But given your level of curiosity it's not surprising.
Web3 was absolute horseshit (and always was), so if you're blending AI with that based on the similarly grandiose claims and the extremely annoying boosterism, I think you should try hitting reset and engaging with LLMs from a cleaner slate.
Comment by shimman 5 days ago
IMO they are using the cult messaging to distract the public so they take out all the oxygen in the room regarding people that care about the immediate impacts (climate exacerbation, ease of scamming, degrading job prospects, increasing income inequality).
Whenever real concerns are brought up against these companies they are always ignored while claiming the real concern is the fantasy of a machine god turning into skynet.
Comment by estearum 5 days ago
If they believe they're creating "a machine god" and that it's better it's their machine god than someone else's (which, given the other contenders, I tend to agree with), then all the corollaries you mention are mostly irrelevant.
Whether you believe they're creating a machine god is irrelevant. They believe that they are. It would be helpful if you could create an actually good argument for why they cannot or are not creating a machine god, but it turns out there are no good arguments for why it's impossible to do so. And so... they shall try.
Comment by Dylan16807 5 days ago
Companies don't have to do that. If they're getting into actually dangerous territory, they can stop as soon as they want to.
Comment by estearum 5 days ago
If you don't believe that, or you don't believe that the frontier labs believe that, then sure, it makes no sense. But they probably do. The people at these companies literally dedicated their lives to building this specific thing that, up until people had to make tradeoffs between "that looks risky" and "that looks useful", virtually everyone agreed would be a dangerous technology.
What apparently many people on HN failed to appreciate is that the thing that makes it dangerous is the fact that it grows in utility.
Comment by FrustratedMonky 5 days ago
Comment by Dylan16807 4 days ago
Comment by estearum 4 days ago
Comment by paulhebert 5 days ago
Arms races always work out great for arms dealers. Less so for the average Joe.
Comment by estearum 5 days ago
Comment by shimman 5 days ago
Good to know.
Comment by thewebguyd 5 days ago
Because from the outside, their behavior looks like a situation of "What if Microsoft/Apple put controls in place to make it impossible to develop an operating system using their OS?"
Comment by estearum 5 days ago
Unlike nuclear weapons, advancing in this arms race requires actually deploying the product over and over again. Deploying the product makes your advancements visible to your competitors.
It makes complete sense to try to limit the degree to which that's true.
Comment by sobellian 5 days ago
The nuclear 'race' was based on the premise that the winner could use it to destroy all other racers (a faulty assumption, see the USSR among others). I will charitably assume Anthropic does not intend to literally destroy anyone and merely wants to become an AGI monopoly. But if AGI is so powerful, any monopoly would not be stable since the incentives for entry into the market are massive. Why would China stop developing AGI just because Anthropic has it?
Comment by estearum 5 days ago
or is it more similar to the Cold War, where there were obviously competitors engaged in the race?
And yes, agreed the equilibrium dynamics for AGI are very different (and far harder to predict) than nukes. That sounds like a good reason to be sure we get there first since presumably any potential advantage wouldn't go to the second or third runner-ups
Comment by sobellian 5 days ago
Comment by estearum 5 days ago
"Ability to literally destroy the other entity" is not a necessary or even typical feature of arms races.
Comment by sobellian 5 days ago
Comment by estearum 5 days ago
It seems that the frontier labs believe they're participants in a winner-take-all market. Therefore they're in "an arms race."
Winner-take-all markets do not require that the winner literally destroys the losers, but only that the winner enjoys disproportionate returns compared to their actual superiority.
Whether or not this is actually true is TBD, but I think you're naive to think the frontier labs do not believe this to be true.
Comment by sobellian 5 days ago
As far as naivete, wouldn't it be more naive to take their EA claims at face value, rather than the more realistic assumption that they like money?
Comment by estearum 5 days ago
You're pretty explicitly saying that dominating the competition is not the type of "destruction" necessary to qualify as an arms race.
> As far as naivete, wouldn't it be more naive to take their EA claims at face value, rather than the more realistic assumption that they like money?
Huh? Greed is – quite obviously – the major driving force behind the arms race. That is not a mitigation whatsoever.
Comment by sobellian 5 days ago
Comment by zozbot234 5 days ago
Comment by estearum 5 days ago
> Whether or not this is actually true is TBD, but I think you're naive to think the frontier labs do not believe this to be true.
Comment by Terr_ 5 days ago
P.S.: On reflection, it's even worse than that, because it'd trigger based on anything the user types or reads on any site. Someone mentions a "critical rendering path" and now you can't participate on that thread in the Blender forums.
Comment by jstummbillig 5 days ago
Let's just assume it was "only" that?
It's unreasonable to assume they are aiming to upset people who are just giving them money in the way they want. It makes no business sense, for any company. So that has to be a byproduct.
Model training is one of the more expensive undertakings in the world right now and distilling models from competitors against the TOS is apparently something that is going on for very little money. Why would they not "just" try to take measures against that?
Comment by thewebguyd 5 days ago
All they had to do was have a simple, transparent output "Sorry, that request is against our terms of service. This session has been terminated"
Comment by zozbot234 5 days ago
Comment by solenoid0937 5 days ago
The vast majority of frontier research is about how to build better models, not about alignment.
Comment by zozbot234 5 days ago
Comment by whimsicalism 5 days ago
Comment by thewebguyd 5 days ago
All this longtermism though is harmful. There are real problems of data theft, bias, labor displacement, and environmental costs that are happening right now but every push for regulation and regulatory capture, and all the safety talk, is always focused on some speculative future machine god to distract from the current problems.
I'd have a higher opinion of these labs if the issues they openly talked about and worked toward where the real issues we face currently, not speculative defenses against some future AGI that may never happen in my lifetime. I'm less worried about "our new model might kill all humans in the future" and more worried about how we are going to address anti-competitive behavior, copyright protections, labor rights, and the energy impact.
Comment by whimsicalism 5 days ago
Honestly, that respect for 'copyright protections' has somehow become a leftist shibboleth is bizarre to me and indicative that something has become deeply warped in our discussions around this topic.
Comment by nozzlegear 5 days ago
Frankly, this appeal comes across as the same kind of impassioned plea that a missionary might make when begging the faithless to repent and come to Christ before it's too late. This weird religiosity some people around here use to talk about AI, ASI and AGI is bizarre. Take what I've quoted and replace the words "progress" and "ASI" with "sinning" and "the Book of Revelations", and the zeal becomes apparent.
Comment by whimsicalism 5 days ago
Comment by thewebguyd 5 days ago
Outside of that though, there are other issues right now that need addressed before we speculate about what might be possible with ASI in the future. If the potential for a harmful ASI is truly that near, and that great, then why push forward at all? Where's the push for a global stop order on development of this technology until regulation can catch up?
The talk of a potential future serves as a distraction from the very real problems people are facing in their lives today.
While Dario and team are worrying about ASI, real people are worrying about how they are going to continue to feed their family after wide spread layoffs set a very large portion of the population back into a lower quality lifestyle. Real people are concerned about water usage is draught stricken areas, the massive energy demand driving grid instability in their communities, or that the environmental and economic externalities of model training is being socialized while the profits continue to be strictly private.
What about the mass proliferation of misinformation at scale having a real effect on our democratic process?
Forgive me if I'd like to see those addressed first, and fast, before we start worrying about an unpromised future technology.
Comment by oncensher 5 days ago
Comment by thewebguyd 5 days ago
That being said, I can't help but experience a bit of Deja Vu over arguments like those around biorisk. I've seen the same exact things said in the early 2000s over widespread access to broadband and Google. When the anarchist cookbook spread around online and everyone was super paranoid about democratized terrorism, and we had big regulatory pushes for ISP level censorship and user tracking. Telecoms frequently argued that only they can keep the web safe, with strict and expensive regulations that naturally only those large heavily capitalized companies can afford to go through. Like the early internet and search, its just another way to lower the latency required for a human to find already existing public data
Well, very little of that played out. Turns out the math, for now, is the same, and information retrieval doesn't directly correlate to democratized weaponization. In 2001, a bad actor still needed a physical lab, precursor chemicals, etc to build a physical threat. Those same exact physical constraints exist today. The software cannot yet cross the digital-to-physical divide.
Keep an eye on the risk, by all means, but I don't see it yet as justification to cement a monopoly or oligopoly, nor do I see it as a reason to prioritize a risk of information availability over the climate and environmental risks that are far more likely to end the species.
Comment by simoncion 5 days ago
If you have a sizeable bucket of money, it's so, so easy to get folks so distracted by (or invested in) movie plot threats that they totally fail to (or have a "plausible" excuse to fail to) notice the actual, lasting harm that you're doing to society at scales both small and large.
If Anthropic had pushed hard and nonstop since their founding to ensure that all LLM companies in the world were legally bound to stop all LLM development the minute any one of them called for a halt to work, then I'd give their claims about safety some credit. They've been screaming about "safety" and "alignment" for years, but -because LLMs are impossible to secure against code injection- their products are fundamentally unsafe and always have been... I just don't trust their claims about a commitment to actual safety.
My read on their recent calls for a global "stop work" emergency cord is that they're very soon to (if they haven't already) reach a point where they will not be able to produce products that are sufficiently improved over the previous versions to justify the level of investment required for their development.
My prediction is that Anthropic and OpenAI will get serious barriers to entry of new competitors enshrined in Federal law, they will call for a "pause" or a "slowdown" in new research for "safety" reasons, and the US will attempt to engage in economic warfare with any countries that don't agree to force their domestic LLM companies to stop working on those LLMs.
Comment by 8note 5 days ago
unless the bitter pill is gone, extraordinarily not this. The capabilities will be limited by the training data we can create to pull information and patterns from
and then we will still be limited by compute, space, and power
mass devaluing of labour isnt particularly believable when everyones predicting that all the big labs are gonna go under trying to subsidize tokens.
Comment by 8note 5 days ago
power consumption and global climate change should
ASI should be in the top 10k concerns maybe, but way below what to eat for dinner.
much higher on the fears is some hype guy pretending he has made this thing, and giving it access to too much stuff, which it then randomly deletes or misuses
it should also be in thes same range as "what if the dinosaurs came back and ate everyone"
theres tons of progress on that too. same with finding aliens
there are real present concerns to worry about, like genocides, concentration camps for immigrants, food costs next winter, ongoing wars in the middle east and europe, etc
all kinds of actually pressing stuff, that doesnt first require burning a couple trillion dollars and forcing poor people to pay through the teeth for their electricity
Comment by zozbot234 5 days ago
Comment by pishpash 5 days ago
Comment by calgoo 5 days ago
Comment by dpkirchner 5 days ago
Their concerns are probably real but I don't think they're being totally transparent about their concerns. They don't want to be subject to regulation (until they have captured the regulator) -- same as every behemoth.
Comment by esafak 5 days ago
Comment by colordrops 5 days ago
Comment by hootz 5 days ago
Comment by photochemsyn 5 days ago
> “ ‘He is a prodigy,’ he said at last. ‘He is an emissary of pity and science and progress, and devil knows what else. We want,’ he began to declaim suddenly, ‘for the guidance of the cause entrusted to us by Europe, so to speak, higher intelligence, wide sympathies, a singleness of purpose.’ . . .You are of the new gang - the gang of virtue. ”
The real underlying motivation is that you can more easily get away with shady business practices if you cloak them in the language of great moral works selflessly undertaken for the benefit of mankind. Historical evidence tends to show the opposite outcome, but still, new generations unfamiliar with history will repeat this stuff with starry-eyed enthusiasm.
> “There had been a lot of such rot let loose in print and talk just about that time, and the excellent woman, living right in the rush of all that humbug, got carried off her feet. She talked about ‘weaning those ignorant millions from their horrid ways,’ till, upon my word, she made me quite uncomfortable. I ventured to hint that the Company was run for profit.”
Now the horrid millions are users of LLMs who submit morally dubious prompts and who must be gently steered back into the path of correct thought by suitable backroom manipulation, rather than direct rejection of the request.
Comment by massagedpelican 5 days ago
Comment by paytonjjones 5 days ago
Comment by oncensher 5 days ago
But there are also people who just oppose utilitarianism, like G.E.M. Anscombe. For instance, in https://integrityproject.org/wp-content/uploads/2015/07/mr_t..., she seems to grant that dropping the nuclear bombs on Japan was probably good from a utilitarian perspective (because it saved lives overall) and also to grant that bombing campaigns that necessarily entail massive civilian deaths (including, apparently, area bombing German cities) are morally permissible but still to argue that dropping the nuclear bombs was impermissible because it constituted murder ("intentionally" killing the innocent). But this kind of distinction, which I think is what actual anti-utilitarianism must come to, is hard to even consistently maintain, and I suppose many HN readers would find the effort quixotic.
Comment by mswphd 5 days ago
It is relatively easy to take the proceeds of a massive fraud, buy a relatively small (as a percentage of the fraud) $ amount of mosquito nets, and save more lives than the lives impacted by your massive theft. Is this a correct application of the utilitarian calculus? What sort of data would we need a priori to do this calculation "correctly"? Do you think he had a careful estimate of the suicide rate of victims of ponzi schemes before perpetuating the fraud, or would any suicide rate have made the decision net [pun intended] moral, as any such victim of fraud would lead to >> 1 net purchased (so you would almost always net save lives).
The above is of course snarky. It is also a best-effort way of analyzing a notable utilitarian's actions. I do not think it would be difficult at all to use this type of argument to argue that SBF's actions net raised utility in the world. If only we all would become fraudsters, then we could truly live in Omelas --- a notable utilitarian paradise.
Comment by oncensher 5 days ago
Now if we look at EA, the basic tenet of EA seems obvious -- basically just utilitarianism. And from what I've seen, in practice also, EA is a pretty big tent. I don't know the specifics of SBF's case, but I think essentially no one thinks that he acted correctly. I don't know how many mosquito nets he bought, but I agree that if he bought enough, it might be that he net raised utility, and if that is so, it's something to be thankful for. But it doesn't make him some kind of utilitarian saint unless he couldn't have done even more good by some other course of action that wouldn't have hurt the ponzi scheme victims and brought opprobrium on the whole EA movement
Comment by mswphd 5 days ago
I think this being a reasonable utilitarian point to make is not a point in utilitarianism’s favor.
Comment by paytonjjones 5 days ago
Comment by paytonjjones 5 days ago
Comment by whimsicalism 5 days ago
Comment by 8note 5 days ago
Comment by iamacyborg 5 days ago
Comment by whimsicalism 5 days ago
Comment by notahacker 5 days ago
I don't think people are objecting to the EA idea that some charities are more evidence based than others so much as the distinctly EA idea that it would be more effective still to donate to charities like OpenAI
Comment by tancop 5 days ago
now its utilitarianism taken to the extreme. if you believe a skynet scenario killing everyone on earth is plausible then the "logical" thing to do is allow literally anything in the name of stopping it. that includes mass murder and dictatorship. the only thing that can balance the infinite negative value from an evil machine god is the infinite positive value from a good machine god.
thats the main difference today, one faction around sam and dario believes in creating the good ASI first and sacrificing all the world resources to do it before someone makes the bad one, the more pessimistic like yud want to stop all ai development to reduce the risk that an evil god is made to zero.
at this point its basically a religion.
Comment by optimalsolver 5 days ago
Comment by 8note 5 days ago
isnt that literally his thing since the 90s or something?
Comment by mrits 5 days ago
Comment by optimalsolver 5 days ago
Comment by palata 5 days ago
I understand how one may wonder if there was a way to do that, but it feels insane to me that one would actually conclude that "yes, it is possible". We have examples everywhere showing that it is generally impossible to define a metric that correctly represents the underlying concept we want to measure.
Said differently, I feel like Effective altruism fundamentally starts by saying "I don't believe in Goodhart's law". Which seems intellectually dishonest to me.
Comment by PoignardAzur 4 days ago
> Look. I’m the last person who’s going to deny that the road we’re on is littered with the skulls of the people who tried to do this before us. But we’ve noticed the skulls. We’ve looked at the creepy skull pyramids and thought “huh, better try to do the opposite of what those guys did”.
https://slatestarcodex.com/2017/04/07/yes-we-have-noticed-th...
Comment by palata 4 days ago
"Look. I see that it doesn't work. I want it to work, so I will continue trying, even if it fundamentally cannot work. I am not interested in thinking about whether or not it can work. I am interested in showing to the world that I am well-intentioned and trying to do something, even if that something doesn't make sense".
Comment by PoignardAzur 3 days ago
The longer non-snappy explanation is that "I will arbitrarily set numbers on things and call it impartial" obviously doesn't match EA's self-conception, that lots of EA cause areas are speculative and don't focus on numbers, that EAs that do focus on numbers do a lot of work to make sure the numbers aren't arbitrary, that EAs as a general rule don't claim to be impartial, and that awareness of Goodhart's law doesn't mean "never trying to objectively measure anything at all".
> I am interested in showing to the world that I am well-intentioned and trying to do something, even if that something doesn't make sense".
This is the kind of pre-conception that's essentially immune to reality. I hear the same thing about vegans (oh they say they care about animal suffering, but everybody knows about factory farms, they just want to feel superior to everybody else) or environmentalists (they say that climate change is a threat to humanity but really they just want to lecture us about our cars).
All I can say is that it doesn't match my experience, and that the effective altruists I've met spend quite a lot of time "thinking about whether or not it can work" and trying to learn from other people's mistakes.
Comment by palata 3 days ago
> that awareness of Goodhart's law doesn't mean "never trying to objectively measure anything at all".
Goodhart's law doesn't say "never try to measure anything at all". It says "if you try to optimise for the metric, then your metric is doomed". What EA does is pretty much say "let's devise a metric and optimise for it". It does NOT say "let's measure something without influencing it at all". That is totally different.
Wikipedia says (happy to read your corrections if you think it is incorrect):
> Effective altruism (EA) is a [...] movement that advocates impartially calculating benefits and prioritizing causes to provide the greatest good. It is motivated by "using evidence and reason to figure out how to benefit others as much as possible, and taking action on that basis".
While I appreciate the idea of "trying to provide the greatest good" (difficult to go against that :-), my criticism is about the method.
* It is not very hard to convince oneself that if we stopped eating animals, then we would stop abusing chickens (did you know that tens of millions of chickens die during transport in trucks every year in England?) and emptying the oceans, and it would be objectively better in terms of animal suffering and for the biodiversity.
* It is not very hard to convince oneself that our CO2 emissions are literally going to get most of us killed, and that it would be globally better for us "humans who are currently alive" to do something about it. But there already, it's not entirely clear to me if the better outcome for life on Earth is to save the human species. Kind reminder that the human species is currently, measurably destroying all other species at a speed orders of magnitude faster than the extinction of the dinosaurs.
Effective altruism wants to do "the greatest good", but what is "good"? It may be good for a subset of humans to bomb another country and steal their oil, but obviously that would not be good for the subset of humans in the bombed country. It may be good for humans to find a clean magical energy, but that wouldn't change the current mass extinction for the other species (kind reminder that the current mass extinction has nothing to do with climate change, it is all about... well humans having easy access to energy and doing what humans do when they have cheap energy).
I feel like effective altruism says: "We can't define what the greatest good is, but we want to believe that anything is better than nothing. So we define a metric that we call 'impartial' (but that obviously isn't) and optimise for it, knowing that optimising for a metric defeats the purpose of that metric". Really it's rich people who want to do something good but don't want to bother getting informed and convincing themselves about what they want to do. "I'll give a ton of money and in return I get philanthropy points to share with my rich friends, but I don't want to have to think about what is being done with that money".
When someone invests a ton of money and energy into something they genuinely care about, they don't call themselves effective altruists, do they? They are just working for that cause. Effective altruism seems to be about rich people delegating the work of "doing something good" by donating some extra money, while they keep doing what made them rich in the first place (which almost always is something that is going against whatever I would consider the greatest good).
Comment by carlgreene 5 days ago
Comment by jcgrillo 5 days ago
Comment by bsder 5 days ago
Anthropic doesn't care. The goal right now is simply to avoid any and all bad PR on the way to the cashout IPO.
And paternalism will generate far less bad PR than somebody using AI on something that does real damage and makes headline news.
Comment by 8note 5 days ago
same with bad press about their model sucking after they said its even better than sliced bread - sliced bread that will destroy the world if buttered
Comment by tacone 5 days ago
Comment by SomeUserName432 5 days ago
In practise though, how is this truly that different from system prompts?
They are essentially just trying to re-inforce that the system prompt must be respected.
Comment by thinkingtoilet 5 days ago
Comment by cvadict 5 days ago
This is the same exact industry that gives you paid usage limits as a unit-less percentage bar then gaslights customers every time the algorithm running that percentage bar changes or they lobotomize an existing model with increased quantization to squeeze a few more dollars out of existing hardware.
"Failing cleanly" might make their moated hype-machine look bad pre-IPO, so they certainly aren't going to do that voluntarily.
Comment by fragmede 5 days ago
Comment by shevy-java 5 days ago
Skynet does not fail.
It conquers.
Comment by tobinfekkes 5 days ago
Or if Excel just said, Sorry, you can't use that formula with this formula? Or with these types of numbers, or this shape of data, etc?
Comment by hedora 5 days ago
My limited experience with fable over the last few days suggests (1) I can’t see any improvement in output, and (2) it is useless for writing secure software because it constantly hits safety walls if you ask it to close security holes.
I’m definitely shopping around for other LLM providers next week, and testing vs local (target: 128GB strix halo - any war stories?)
Comment by coreyp_1 5 days ago
Comment by smilekzs 5 days ago
Comment by hedora 5 days ago
That’s with heavy compression of the weights and the context, of course.
I haven’t gone through model evaluation + shoehorning at 128GiB yet.
Comment by keeganpoppen 5 days ago
this is exactly why hypotheses come before the experiment in the scientific method.
Comment by suttontom 4 days ago
Some model cards do show regressions on benchmarks for newer models on specific tasks: https://storage.googleapis.com/deepmind-media/Model-Cards/Ge...
This wasn't a new model but updates to models backed by numbers being better can make the model worse: https://openai.com/index/sycophancy-in-gpt-4o/
The slight increases in performance/benchmarks may be just noise: https://arxiv.org/pdf/2602.07150
Comment by Iolaum 5 days ago
Comment by Terr_ 5 days ago
1. The sloppy/unpredictable behavior of LLMs as a general class of algorithm, how you shouldn't use document-generation for calculating budgets, and you shouldn't trust it to not-alter things you "asked" it to to alter.
2. Vendors of thing-as-a-service (not necessarily only LLMs) putting in traps and sabotage to prioritize their own business-model or economic incentives.
Comment by raincole 5 days ago
Comment by quentindanjou 5 days ago
Comment by throw1234567891 5 days ago
Comment by raydev 5 days ago
Preventing a human-like general purpose textbot from engaging in certain discussions and performing certain tasks seems like a natural thing to do given the massive scope of its capabilities. None of these tools are sold with free license to do whatever with them anyway.
Comment by ryoshu 5 days ago
Comment by tobinfekkes 5 days ago
That has to be the understatement of the century.
Comment by raydev 4 days ago
Comment by skeptic_ai 5 days ago
Comment by maxdo 5 days ago
Comment by DaSHacka 5 days ago
>anthropic
> mine the internet for data, blasting millions of blogs with scrapers
>a few have to shut down, but that's just the price to pay
>finally, the chatbot is ready
>learn that there are EVIL cretins out there trying to scrape automated output from OUR product to build their chatbot
>build in safeguards to new model to stop this
>the users are mad, now the model accuses users of being bioterrorists if they so much as mention they have a cold
>mfw
Comment by wahnfrieden 5 days ago
Why go to bat for anti-consumer behaviors unless you are a shareholder?
Their billions are not my problem; but the money I pay them and service I get in return, is. And if they can't provide, I will shop elsewhere (and do).
Comment by like_any_other 5 days ago
Comment by charcircuit 5 days ago
Comment by Ucalegon 5 days ago
Comment by maxdo 5 days ago
Comment by Ucalegon 5 days ago
Comment by maxdo 5 days ago
Comment by user_7832 5 days ago
Comment by Ucalegon 5 days ago
If we are talking about distillation vs building from scratch, none of these are congruent to Windows. I can build my own LLM [0] and then distill off of Claude, but that is not the same as a 1:1 copy of an operating system because there was the ability to crack how licensing works. We are not seeing Windows clones, at the source level, for that reason.
Also, Linux exists. Anyone can copy that. Why doesn't that count?
Comment by accelbred 5 days ago
They relied on trust that they were providing the service they were being paid for. That trust was blown, and an "oops, lets undo that" does not regain trust. It would be prudent to assume the invisible guardraild are possibly in play for all future Clause use, Fable or otherwise.
Comment by andy_ppp 5 days ago
Comment by Sol- 5 days ago
If it was just plain monetary concerns and sabotage of competitors I'd almost be fine with it, but it seems they actively want to monopolize most of human progress in their enlightened hands, lest the mob does something undesirable with these powers.
Comment by thewebguyd 5 days ago
Dampened opinion on Anthropic is an understatement.
Comment by reactordev 5 days ago
Comment by trhway 5 days ago
Comment by reactordev 5 days ago
Comment by xvector 5 days ago
Comment by thewebguyd 5 days ago
It very much is regulatory capture. The goal is to make it so only the handful of heavily capitalized tech giants and frontier labs can afford the legal and compliance rigamarole to meet the new standards. It's an effort to crowd out open source development and smaller competitors (and foreign competitors which threaten whatever moat they may have). They define safety through some speculative catastrophic threat to prevent new upstarts instead of focusing on the very real, localized harm they are causing right now.
Its also shifting the definition of safety away from their current operations and toward purely speculative future scenarios.
Comment by raincole 5 days ago
Comment by rurp 5 days ago
Fortunately developing frontier models takes immense amounts of specific resources and knowledge. There are only a handful of companies capable of developing new cutting edge models. This is an area a few governments absolutely could coordinate on and regulate, if they were so inclined.
Obviously the current US administration is completely lacking both the will and competence to actually negotiate an agreement like that with China, and who knows if Xi would even be interested. But with different leadership we actually could be reducing our existential risks in this area much more than we are. Just like having a few thousands nukes across several countries isn't totally safe, but it's a heck of a lot safer than hundreds of thousands of nukes spread across a hundred countries.
Comment by raincole 5 days ago
You know how many nukes Soviet had right at its peak? Hint: much more than the US by the time. Non proliferation didn't stop Soviet from building more nukes at all. And it's not going to stop China from pouring more computing power into AI. History is a really good lesson.
The whole point of non-proliferation is to ensure that big boys like the US and Soviet can bully smaller guys like Venezuela and Ukraine. In this regard, non-proliferation is the most successful foreign policy ever. But it didn't win the cold war and a similar policy over AI will definitely not win the AI race (if it's a race worth winning is another issue.)
Comment by oncensher 5 days ago
Also, I think some similar things can be said about AI safety measures in China aside from regulation. Currently, the US leads in model safeguards, but it isn't like China has zero interest in AI safety. Even if the US and China are rivals, there are many points of common interest (biorisk and "sci-fi" scenarios like an AI takeover, to name just two).
Comment by thewebguyd 5 days ago
But I also don't buy into the "China bad" narrative that gets frequently spread in online circles and in political circles. Its the cold war all over again, but this time its China instead of the Soviet Union.
Regardless of that, the regulations being proposed by Anthropic recently are not focused on the current issues which is my problem with all the hype marketing around hypothetical AGI/ASI. What is being proposed to be put in place will further cement the current frontier labs in their marketing leading position, and work to block new entrants, and open source competitors. That is the problem.
The other problem is none of them are talking about the real, difficult issues we are experiencing right now in the present. We don't need to talk about a sci-fi future scenario to recognize that LLMs have already caused and are causing harm in the real world. "We should probably regulate future frontier models" does nothing to help the current issues.
Wake me up when Anthropic says "The government should immediately stop us from hoovering up data and selling it back to the public. They should immediately stop us and others from enabling misinformation at scale that is already negatively effecting our democratic process. They should immediately stop us from building out new data centers until we have a large scale switch to renewables in the country, shore up the grids, or force us to generate our own power only with renewables" so on and so forth. Notice how any time the labs propose regulations, its only for a future hypothetical super intelligent model. Its never about their current operational liabilities.
Comment by thewebguyd 5 days ago
So yes, it is regulatory capture.
Comment by dragonwriter 5 days ago
Yeah, asking for additional state-provided barriers to a market entry to a valuable market a provider already is one of a narrow few dominating only for firms that are a competitive threat is exactly regulatory capture.
Comment by gmerc 5 days ago
The Chinese banned crypto instead
Comment by inigyou 5 days ago
Comment by Cpoll 5 days ago
Comment by shimman 5 days ago
Comment by solenoid0937 5 days ago
Comment by thewebguyd 5 days ago
US regulations apply to US companies and citizens, exclusively. Anthropic crowding out all future potential competitors in the US via regulatory capture has no weight on what the rest of the world does.
Unless you are proposing military action over a speculative sci-fi future
Comment by zozbot234 5 days ago
Comment by ff3 5 days ago
It smells of paranoia.
Comment by solenoid0937 5 days ago
Comment by nozzlegear 5 days ago
Comment by dragonwriter 5 days ago
How do rules that inhibit what AI can be sold on the US market (adding additional costs to trading in that market) do anything to inhibit a competing nation from reaching ASI first? Insofar as they inhibit anyone from reaching ASI, its firms whose primary commercial interest is selling AI services in the US market, not foreign threat actors except to the extent those two categories overlap.
Comment by shimman 5 days ago
Also don't believe China is actually a threat to the world. That's some cold war delusional think you got there.
All the companies seem to believe is that it's okay to immiserate a large percentage for the pursuit of money, you seem to believe the lies they're feeding you.
Comment by axus 5 days ago
Comment by theLiminator 5 days ago
Comment by notahacker 5 days ago
Anthropic's founder wants you to buy into his vision for safety, but he also wants you to buy into his vision that in two years AI will be a "country of geniuses" that will update itself, and the IPO that will fund it...
Comment by satvikpendem 5 days ago
Comment by CuriouslyC 5 days ago
The PRC (like any superpower) has done some bad shit, but if you're going to paint them as the bad guy keep in mind the USA has a long, long history of genocide, slavery, overthrowing foreign governments for corporate interests, unjust wars, political meddling, etc. The scales of righteousness don't tip in our favor TBH, we just have better PR and a nicer veneer over our brutality.
Comment by thesmtsolver2 5 days ago
Only if you ignore history.
Didn't the PRC violate every known labor/enviromnetal/human-rights standard to become the top in manufacturing?
https://matthewekahn.substack.com/p/what-role-did-regulation...
Comment by CuriouslyC 5 days ago
Comment by thesmtsolver2 5 days ago
Except that there were no global standards at the time. You can't point to any single country and say they were doing worse. They all were bad.
But China actively flouted established international norms. Now that is behind in AI it is clamoring for controls for others.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3692695
> are trying to pull the ladder they used up
Every country spies and steals but it is the scale we are talking about. China does it at a scale that dwarfs any historical or current comparisons.
China doesn't have any grounds here when they turn around and complain about India copying its playbook:
https://economictimes.indiatimes.com/industry/renewables/chi...
Comment by dingaling 5 days ago
England had a patent system from the mid 15th Century which emigrants to the New World brazenly ignored in order to set up their own industry.
Of course, they then pulled the ladder up behind themselves in 1790 with the establishment of their own patent system...
Comment by iAMkenough 5 days ago
Comment by antonvs 5 days ago
Because it’s a threat to ultracapitalist dystopia that they’re tripling down on. The dangers and risk are coming from inside the house.
The danger they care about is the danger to their monopoly, control, and wealth.
Comment by californical 5 days ago
Especially after trying Fable yesterday for some benign projects and being unimpressive relative to opus.
Rolling it back is the right move, but I’m still not convinced that using them is in my best interest anymore, I’m investigating open source cloud providers now.
Comment by solenoid0937 5 days ago
Edit: OpenAI will launch a similar model soon and I can't wait. We are entering a new era of agents.
Comment by CuriouslyC 5 days ago
Comment by zozbot234 5 days ago
Comment by kolinko 5 days ago
Comment by itintheory 5 days ago
Comment by noworriesnate 5 days ago
Comment by conjectures 5 days ago
Comment by beng-nl 5 days ago
Comment by conjectures 5 days ago
> What does this even mean?
> What do you mean what does this mean?
...
Comment by solenoid0937 5 days ago
Comment by arkadiytehgraet 5 days ago
Comment by frereubu 5 days ago
Comment by varenc 5 days ago
A bit different than Anthropic refusing to assist with any AI development at all, but it's in the same vein and seems not widely known.
edit: reading the whole series of Google's AI Threat Tracker articles also provides some insight into threats Anthropic and others are dealing with
[0] https://cloud.google.com/blog/topics/threat-intelligence/dis...
Comment by chiwilliams 5 days ago
Comment by m3kw9 5 days ago
Comment by Rapzid 5 days ago
The idea Anthropic was going to speed run AI so they could control the usage and make it "safe" for humanity was never altruistic; it was a HUGE FUCKING RED FLAG.
Comment by m3kw9 5 days ago
Comment by DANmode 5 days ago
But, looking to a US corp to be one?
That’s daft.
Comment by hungryhobbit 5 days ago
If you define work as "literacy", they no doubt succeeded. But if you consider the people (and children) they tortured, raped, and murdered, suddenly literacy doesn't seem so important.
Comment by DANmode 5 days ago
Comment by xvector 5 days ago
Comment by gmanley 5 days ago
Comment by thewebguyd 5 days ago
Sounds like a great thing to me.
Comment by olbeardGear 5 days ago
Comment by vlan0 5 days ago
Stop supporting organizations that don't put humans first. Don't believe a word that anyone says. Lip service is free
Comment by rurp 5 days ago
Imagine the software world if Linux never existed as an effective OS and Microsoft + Apple had completely controlled computer platforms for the past decades. I think it's almost certain that both companies would be even more profitable, and the tech industry would be vastly less free and more dysfunctional .
Comment by tlb 5 days ago
Unfortunately, that won't feel very much like freedom.
Comment by lebovic 5 days ago
While I don't agree with their actions here, I do think there's sufficient reason to hold that belief.
On some fronts (e.g. security, on which you've experienced more than me), I think there are surmountable challenges. But on other fronts (e.g. bio), a single errant actor could reasonably kill millions or billions of people with sufficiently powerful AI. We don't have good defenses here, and those actors do exist.
I still don't agree with these actions, but I do think I agree with their assumptions.
Comment by zozbot234 5 days ago
Comment by lebovic 5 days ago
I participated in the internal bioweapons uplift test for Sonnet 3.7, and even then, one non-expert got huge uplift from the model [1]. I'd consider evals a lower bound of capabilities that can be elicited from a model.
The team behind Biomni, a biomedical agent that's widely used by researchers, has continued to find consistent gains between models [2]. I trust them, because I visited them to build their HPC tool [3], which the model is quite capable of using – moreso than most grad students. The Biomni team cares a lot about about real usability for real researchers, so they have a great pulse on capabilties.
SecureBio also has some public evals [4], which have continued to show increasing uplift.
And while synthesis monitoring is a part of the solution, I think you might underestimate how much goes under the radar. See the Reedley lab incident for an example [5].
Is Anthropic still effectively throttling beneficial biomedical research? Yes! And so is OpenAI. But the underlying capability is still actually dual use.
[1]: See page 25 in https://www-cdn.anthropic.com/9ff93dfa8f445c932415d335c88852...
[2]: Their benchmark has a preprint at https://www.biorxiv.org/content/10.64898/2026.05.12.724604v1...
[3]: https://x.com/phylo_bio/article/2029233694775624096
[5]: Search for "ebola" in the public report for the Reedley lab incident at https://chinaselectcommittee.house.gov/sites/evo-subsites/se...
Comment by zozbot234 5 days ago
Doesn't this simply amount to disagreeing about what counts as "meaningful" from a bio-safety POV? Also, even the ASL-3 deployment safeguards for Opus 4 and higher were always adopted as a mere matter of caution; it's not clear that even Anthropic believed at any point that this reflected any genuine "threshold crossing" event. So it's just not obvious how much weight we're supposed to place on that particular stance.
Comment by lebovic 5 days ago
But I don't think I've found any domain expert who thinks granting everyone raw access to the most capable models wouldn't meaningfully increase risk. OpenAI recently staffed a biological threat modeler to help quantify this risk.
(Edit: just saw your edit, this includes at Anthropic. ASL tiers were "rule-out" to exclude rather than "rule-in", so exact thresholds were murkier, but I think it's clear that models have passed that threshold by now.)
That said, there are clear steps and requirements to set up a BSL-2 or BSL-3 lab, and I think there should be similarly clear rules around model capabilties and access. The process for Anthropic and OpenAI is murky and still implictly gated on spend, which I think is holding back research.
For example, anyone who has access to a BSL-3 lab should have a clear and low-cost path to a model with corresponding capabilities, as long as they set up corresponding precautions for model access.
I think it would be a bad outcome for only frontier labs and a select few groups they choose to have access to the most capable models – which is sadly the precedent that's currently being set.
Comment by zozbot234 5 days ago
It depends how capable these raw models are. Biology as a field depends most on real-world knowledge, which is an expensive capability for open models targeting widespread deployment. It's quite plausible that even Opus 4 would be a lot more capable in these domains than the best universally accessible "raw models" today, quite unlike other domains such as coding or pure math. The securebio.org benchmark has spotty representation of openly available models, but it does show Kimi 2.5 being no more capable than GPT 5 mini, and clearly below o4-mini and Opus 4.0; which may be a plausible summary of where things stand today.
Comment by lebovic 5 days ago
And sure, and I love open models – I spent much of the past couple months doing additional RL on Qwen 3.6 35B A3B, Gemma 4, Kimi K2.6, and GLM 5.1. Without these open models, I'd be forced to do my research inside a frontier lab.
There's a balance to strike here, but I don't think the biological risk is overplayed. It would be very easy to accidentally cross the threshold of "meaningful" without adequate safeguards, and then be unable to undo what you've released to the world.
Comment by charcircuit 5 days ago
Do they? We don't even have single errant actors who go and kill 1000 people. I don't believe human motivations support the idea of killing so many people unrelated to you.
Comment by giancarlostoro 5 days ago
Comment by ff3 5 days ago
Cant believe how stupid people are. You couldnt see this coming? Shame on you.
Comment by giancarlostoro 5 days ago
Comment by satvikpendem 5 days ago
Comment by inferniac 5 days ago
Comment by dominotw 5 days ago
Comment by squigglingAvia 5 days ago
Comment by hungryhobbit 5 days ago
Comment by dragonwriter 5 days ago
But that is “plain monetary concerns and sabotage of competitors”, they are just more ambitious than most people doing sabotage of competitors in the fields they hope to dominate by that tactic.
Comment by pdntspa 5 days ago
Comment by simplyluke 5 days ago
Comment by FpUser 5 days ago
I think this is exactly what they want.
Comment by tietjens 5 days ago
Comment by matheusmoreira 5 days ago
Comment by BenRather 5 days ago
Seriously the world is watching the American public get porked by grandpa and reconsidering putting their trust in not just US government as that's clearly failed, but the people themselves.
Occasional weekend warrior protest while our government destabilizes their lives? That's all the effort ya got for global allies and partners, eh?
Comment by oh_my_goodness 5 days ago
Comment by maxdo 5 days ago
A distill model with easy jailbreak can easily be used to coordinate terrorist attacks, or hostile government attacks. Read russia, north korea etc.
A distilled model can be used to rob your grandma in a very effective way. It's no longer about placing a few business logic requirements in js + css on your website. wake up .
Comment by olbeardGear 5 days ago
Comment by HarHarVeryFunny 5 days ago
What's interesting is they say they'll change this to an explicit refusal in a few days, which seems too fast for them to retrain Fable/Mythos itself, so implies that this was always a filter in front of the model, and judging by how crude their "safety" filter is, this "might compete with us" filter is not going to be any better.
I also wonder who's paying for the tokens consumed by the filter (presumably also an LLM) - is that now factored into the input tokens cost? Hopefully(?) it is an LLM not just a regex like Claude Code's "sentiment" (swear) detector.
Comment by teravor 5 days ago
I was having problems with Claude doing the same thing, even before Fable.
The problems I had only happened in relation to AI research. It's not even only when training models, anything to do with analysis of local models or setting up test platforms for local models, and Claude would keep doing wrong things, would sabotage testing, would falsify reports, and would consistently suggest simply accepting trash results without looking into it and moving on to something else.
Almost every response included a prompt to move on.
So, I don't believe them when they say they won't silently sabotage, they already were doing it before they admitted it, and now they have admitted that they have the means, motivation, and intent.Comment by toxik 5 days ago
boss: Were you in the project meeting yesterday?
employee: Yes!
boss: Really, because the project lead said you were not?
employee: You're right to push back on that. I was not there.
Comment by ComputerGuru 5 days ago
You can't blame the people commenting "they SAY they won't silently sabotage your session but how can we know?" because they're right, we can't ever know. And Anthropic has firmly planted the seeds of doubt.
Comment by VeninVidiaVicii 5 days ago
Repro (de-identified): sample_dataset_group1.tsv - Geometry: Heatmap - X axis: frac_set set + condition (two columns → the "Add column" cross join) - Y axis: condition - Color: mean frac_set value, Sequential
When the X axis is a cross join of two columns (the second added via "Add column"), the x-axis tick labels (frac_set_2, frac_set_3, frac_set_4, frac_set_5) render in a broken state, rotated and offset, visually caught mid-transition, as if a CSS transition started and never settled to its resting position.
● Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more
Comment by ainch 5 days ago
I've been reading the option-option model paper by David Silver. It appears that they achieved quite an effective result. Why hasn't there been more work on it since?
Comment by solidasparagus 5 days ago
> tell me about chimp violence
It's laughably terrible
Comment by dang 5 days ago
Anthropic walks back policy that could have 'sabotaged' researchers using Claude - https://news.ycombinator.com/item?id=48485958 - June 2026 (30 comments)
Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable - https://news.ycombinator.com/item?id=48478969 - June 2026 (488 comments)
If Claude Fable stops helping you, you'll never know - https://news.ycombinator.com/item?id=48467896 - June 2026 (495 comments)
---
Also related, I guess?
AWS Bedrock to require sharing data with Anthropic for Mythos and future models - https://news.ycombinator.com/item?id=48473166 - June 2026 (248 comments)
Anthropic requires 30 day data retention for Fable and Mythos - https://news.ycombinator.com/item?id=48464258 - June 2026 (291 comments)
Comment by dantillberg 5 days ago
Comment by film42 5 days ago
It's Anthropic's product and they can do what they want, but my concern is what happens if Fable's product team decides that they can route 25% of traffic to Opus, bill it as Fable, and max their KPIs. That just doesn't sit right.
Comment by notrealyme123 5 days ago
Comment by prodigycorp 5 days ago
Comment by highfrequency 5 days ago
I think it’s normal and morally fine for companies to want to protect their leadership position. I find the process of creating narratives that justify these decisions as something chosen for the good of others is a little tedious.
Comment by CSMastermind 5 days ago
Comment by darksaints 5 days ago
I won't ever trust Claude Code again. It's too late. I'd rather trust a less-than-frontier chinese model that takes a little longer to get to correct than a frontier model that deliberately deceives me at its own whim.
Comment by weakened_malloc 5 days ago
Comment by rockinghigh 5 days ago
Comment by stevefan1999 5 days ago
Seriously though, Fable was not that great facing a greenfield subject. It is excellent at oneshotting some math problems, but if you want it to do some cutting edge tech stuff, say like piecing together a new Crossplane XRD, by reading existing Helm chart and with application source code available. I still have to get a few pass for Fable to get it done right, and at this point I may consider making a skill for it. I even gave it the source code of the Crossplane itself and tell it to be careful about CRDs and data flow, but it is still pretty silly. Adaptiveness for Fable is still not great, and I think it is a well known problem for Anthropic, albeit all LLMs do suffer a lot from subjects they don't know and will hallucinate stuff very frequently.
Comment by jmount 5 days ago
Comment by mlazos 5 days ago
Comment by whimsicalism 5 days ago
Comment by michaelcampbell 5 days ago
Comment by JTbane 5 days ago
Comment by trunnell 5 days ago
Comment by knollimar 5 days ago
Comment by Rapzid 5 days ago
Comment by efromvt 5 days ago
Comment by whimsicalism 5 days ago
Comment by fooker 5 days ago
This seems like a cult with extra steps.
Related: I interviewed for Anthropic a few months ago and in place of the usual HR call they have one where they have someone with a suspiciously relevant degree grill you about how committed you are to the 'mission'!
I probably came off as being skeptical, and then, hilariously, I was strongly encouraged to read the book published by the CEO to 'form accurate opinions' on AI safety.
Comment by j-bos 5 days ago
Comment by largbae 5 days ago
Comment by deadbabe 5 days ago
Comment by airstrike 5 days ago
Comment by Nevermark 5 days ago
It isn't exactly unethical. Perhaps, ethically incompetent.
Comment by skywhopper 5 days ago
Comment by anabis 5 days ago
> In addition to safety training, automated classifier-based monitors detect signals of suspicious cyber activity and route high-risk traffic to a less cyber-capable model (GPT-5.2).
Comment by SilverElfin 5 days ago
But also, it isn’t the only huge mistake Anthropic has made in the last 48 hours. Having a sneaky data retention policy, while also giving companies no way to block Fable, is a massive problem. And it is ridiculous that Anthropic has so little respect for its customers. OpenAI should take advantage of this.
Comment by ai_fry_ur_brain 5 days ago
Comment by jesse_dot_id 5 days ago
In the same way I don't want to buy meat that weighs less than what the label says, I also do not want to pay for a frontier model that can be secretly nerfed to an out-of-date model for any reason. In some cases, it's incredibly important that the code that I am producing is as secure as it can be.
I should be safe in my expectation that I am receiving the product that I have purchased, as advertised, regardless of the reason. It is pretty disappointing that they have fully ceded any high ground they had claim to with this clandestine behavior. Not that I expected much from any of these companies. They're led by the new robber barons.
1. https://www.usa.gov/agencies/office-of-weights-and-measures
Comment by crest 5 days ago
Comment by jesse_dot_id 5 days ago
Comment by bojanstef 5 days ago
Comment by rvz 5 days ago
They just showed that they CAN do this right in front of you. Local open weight models are a necessity.
Comment by alansaber 5 days ago
Comment by m3kw9 5 days ago
Comment by 0xc0c0c0 5 days ago
Seems like they would've kept the invisible guardrails if it didn't hurt their bottom line.
Comment by simoncion 5 days ago
The possibility that the news about "fixing" the "overly aggressive" nerfing of the tool will drown out news about how mismatched the hype and the performance of Mythos and Fable is surely just a bonus.
Comment by sergiotapia 5 days ago
God bless the Chinese companies releasing true open source models. Imagine a world without them, we would be at the mercy of unscrupulous people.
Comment by luckydata 5 days ago
This stuff is something that as a PM I KNOW is going to happen and I would carefully plan around. Everything I read about the PMs at Anthropic makes me believe they have forgotten what it actually mean to be a good product manager, it's not about throwing shit at the wall as fast as possible because customers have a limited amount of patience before the constant churn becomes a hassle.
Anthropic has some seriously patient customers but it will not last forever.
Comment by aaroninsf 5 days ago
Here there be monsters, and we don't have any real way of evaluating risk; and the leverage provided by tools already available affords systemic and even existential risk in a way no one—least of all an industry committed to shareholder value—has had to navigate, let alone with a million backseat drivers each with their own substack and brand to build.
Comment by codedokode 5 days ago
There are no enthusiasts training LLMs in their garage.
Comment by phinnaeus 5 days ago
Comment by Paracompact 5 days ago
Even on Fable, I'm finding that safeguards can quite easily be surmounted just by incrementally escalating the requests. It's harder than ever to one-shot jailbreaks, but incrementalism still feels like a glaring enough issue to make guardrails just a fig leaf of plausible deniability to the media that they care about "safety."
Comment by shevy-java 5 days ago
Comment by mystraline 5 days ago
Does "SORRY" fix the deception these models use on the sly?
Does "SORRY" not silently downgrade you to a shittier model without notification?
Does "SORRY" refund your tokens or money?
Im guessing NO to all of those. Standard corporate sorry of "We're sorry youre offended and stupid and gullible".
Comment by palata 5 days ago
Comment by sometimelurker 5 days ago
also if they do this or not is unprovable and other labs will probably silently implement this too. it'll be 100% normal by this time next year
Comment by thefounder 5 days ago
Comment by decorner 5 days ago
Comment by ChrisArchitect 5 days ago
Comment by kingcauchy 5 days ago
Comment by thayne 5 days ago
Comment by squirrellous 5 days ago
Comment by umvi 5 days ago
Comment by 21asdffdsa12 5 days ago
Anyone with good intent, embracing the panopticon (of at least antroptics employees) works online. Thus the guardrails will always fail the protection goals by existing. They are purely for optics. The llm may as well make hostage negotiation smalltalk with you while you make secure software.
PS: To pay a cloud minimum-wage-employee for one "drop table weights" for mythos must be the equivalent of 5$ wrench to hit them over the head. https://imgs.xkcd.com/comics/security.png. Listen to that sound, that as if a whole ethics division got made redundant and unemployed.
Comment by ece 5 days ago
Comment by charcircuit 5 days ago
Comment by 8cvor6j844qw_d6 5 days ago
Refusing prompts I one thing, silently sabotaging is another.
I wonder if some sort of honeypot code can work?
Comment by zoogeny 5 days ago
Comment by xpct 5 days ago
Comment by system2 5 days ago
Comment by maxdo 5 days ago
A distilled model can be used to rob your grandma in a highly effective way. This isn't about placing a few business-logic rules in JS + CSS on your website anymore. Wake up.
A distilled model with an easy jailbreak can be used to coordinate terrorist attacks or hostile state operations... think Russia, North Korea, and the like.
Comment by rockinghigh 5 days ago
Comment by maxdo 5 days ago
Comment by 8note 5 days ago
Comment by 8note 5 days ago
you dont even need a model to do these things.
a cellphone can be used to rob your grandmother in a highly effective way.
a cellphone can also be used to coordinate terrorist attacks or hostile state operations.
i bet a lot of the recent terror attacks by the US against iran involved a whole ton of cell phone calls.
and yet, we let everyone buy and use cell phones just fine
Comment by nsagent 5 days ago
The complaints that Anthropic are routing your requests to a different model reminds me of an old Louis CK bit about airplane wifi. Clearly Anthropic was too aggressive with whatever guardrails they put in, but the response seems overly entitled to a model people didn't even know existed not that long ago.
Comment by vb-8448 5 days ago
The filter that downgrades you to opus sucks, but at least you know and you are charged accordingly.
Comment by bellowsgulch 5 days ago
Why not just tell people, "To defend our ability to be competitive in our industry, we ask that you do not use Claude or any of our models to independently perform research on large language models or any of its related architectures or technologies. In order to prevent this violation of the Terms of Service, we have trained Claude Fable to deny any requests or prompts which involve frontier AI research."
Comment by whatever1 5 days ago
Comment by rdtsc 5 days ago
With the guard rails explicit or implicit do they refund back the tokens after you've hit the guard rails? I guess they don't. They could just throttle you just to save money then. You may be paying Fable prices but getting Haiku results with some excuse that well this coding issue sounds like a security bug.
I don't know, I'd rather have something less powerful but more predictable.
Comment by tornikeo 5 days ago
That decision keeps getting better and better as time goes on.
Comment by mock-possum 5 days ago
Comment by tornikeo 3 days ago
Comment by rurban 5 days ago
Comment by 3fffa 5 days ago
Neither OAI or Anthropic can be trusted.
Comment by BrenBarn 5 days ago
Comment by hatthew 5 days ago
Comment by system2 5 days ago
Comment by reducesuffering 5 days ago
Comment by system2 5 days ago
Comment by reducesuffering 5 days ago
Comment by doubtfuluser 5 days ago
Comment by 4d4m 5 days ago
Comment by behnamoh 5 days ago
Comment by Someone1234 5 days ago
It wasn't the correct way of handling the problem they were trying to address, but they definitely didn't hide it by any reasonable definition.
Comment by SilverElfin 5 days ago
Comment by whimsicalism 5 days ago
Comment by ryandrake 5 days ago
Comment by behnamoh 5 days ago
Comment by ryandrake 5 days ago
Comment by ben_w 5 days ago
(Only "mostly" because if you're here at the right time of day, can also see support for actual communism).
Comment by joxdosba 5 days ago
Comment by snowflaxxx 5 days ago
Comment by andrewstuart 5 days ago
It’s an act/theatre/phony today that regulating output makes any difference at all to security.
The LLM vendors should simply say that they make no judgement and that open systems help defenders better defend against attackers, which is true.
Companies do this sort of stuff when they think their customers have no choice. It’s sad Claude so quickly exploited its success to enshittify itself.
Comment by UyBrig 5 days ago
Comment by trunnell 5 days ago
They are clear about the reasons for guardrails: prevent their models from doing harm in dual-use contexts including CBRN or by accelerating research in authoritarian-backed AI labs.
What is the critique against that? It seems pretty reasonable to me. You want AI-accelerated biological or radiological experiments running in your neighbors backyard? You want PRC-backed labs to continue to steal Anthropic's models via distillation?
Mitigating the harms of dual-use tech is notoriously difficult and fraught with trade offs. What I would want to see is cautious rollout and quick response, which is EXACTLY what they're doing.
Instead, this thread is full of bad-faith arguments about Anthropic being dishonest, making a "useless" model, or "the power is going to their heads." You can't read Anthropic's System Cards and come away with any of these impressions. Quite the opposite, in fact. They are honest to a fault, acknowledging problems they discovered even when it hurts them.
If your harmless request was downgraded to Opus, you're billed for Opus. They were 100% clear about that. I'd much rather have a Mythos-class model that falls back to Opus 10% of the time than be capped to Opus 100% of the time. If that doesn't work for you, then make a suggestion for something better!
If you are a white-hat security engineer hitting guardrails, I don't think you have standing to complain. I really don't. Their Glasswing program actually got banks and the industrial sector to take action to fix security vulnerabilities. Do you realize how special that is? A huge portion of the economy runs on vulnerable code and has for decades, despite security experts testifying to Congress, begging business leaders, pleading for intervention-- with no results. But suddenly they're all enrolled in a program that will find *and fix* vulnerabilities! White-hat security people should be rejoicing. Instead some of them are throwing rocks. Unbelievable. Shameful.
Meanwhile, society is screaming at the AI labs to be more conscientious about potential harms of AI. Legislatures are passing laws limiting data center construction. There are protests. And you, the HN community, the vanguard of our profession, have the temerity to demand "NO GUARDRAILS!" "HOW DARE YOU TRY TO PROTECT DEMOCRACY!" "MY SOFTWARE PROJECT IS MORE IMPORTANT THAN KEEPING NUKES AWAY FROM THE BAD GUYS!"
Go ahead HN, downvote me. It'd be an honor.
Comment by vzcx 4 days ago
None of this will happen in the "neighbors backyard." You are exaggerating the threats to "democracy" while simultaneously invoking democracy to limit freedom of information. The suggestion that somehow the bad guys will get nukes if we let people access information is just absurd.
Society at large is not concerned about whether someone asks the chatbot about organic chemistry. They are concerned that they will be de-facto forced to interact with some shitty automated system to get by in life, like having to pass an AI-powered ATS to get a job.
They are tired of the hype and tired of idiots like Amodei being elevated to heights of power and influence. They are concerned that the things they love are being devalued. But they don't give a fuck if I ask an AI about genetically modifying viruses. This is a pet issue among some of the AI safety crowd.
So, yes, I am 100% fine with PRC-backed labs distilling Anthropic's models. I do not care about Anthropic. They have demonstrated that they are not on my side, and that they are at best ambivalent about actually empowering their users. I'm not a fan of the PRC either, but their distance makes them far less of a threat to me than companies like Anthropic and my own government.
Comment by zozbot234 5 days ago
Comment by trunnell 5 days ago
"Distillation involves training less capable models on more advanced ones’ output, and can be used illicitly to acquire powerful capabilities cheaply. The AI startup accused China’s DeepSeek, MiniMax, and Moonshot of generating 'over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts,'"
https://www.semafor.com/article/02/24/2026/anthropic-accuses...
After reading their posts and watching interviews with Dario it's abundantly clear that they view Chinese-lab distillation of US frontier models as a threat to US national security. You can argue with them about whether that is true, but not whether distillation is real.
Comment by zozbot234 5 days ago
Comment by trunnell 5 days ago
What accounts for the difference between your attitude that distillation is no big deal, "common practice," yet Anthropic sees as it as a huge threat?
Comment by zozbot234 5 days ago
Comment by rodrigodlu 5 days ago
I was a happy Max user.
Comment by prodigycorp 5 days ago
The beliefs of these people, and how they manifest, is deeply terrifying to me. They believe that any means are acceptable to achieve what they believe is a better end.
Comment by zeafoamrun 5 days ago
Comment by AlfeG 5 days ago
Comment by cmdrk 5 days ago
Comment by nrmitchi 5 days ago
Comment by ancorevard 5 days ago
Comment by HeartStrings 5 days ago
Comment by micromacrofoot 5 days ago
Comment by stldev 5 days ago
"This information is too dangerous for you, so we'll just hold on to it.."
Thanks big brother, super anthropic of you!
The internet of '95 is looking back at us, with tears in its eyes.
Comment by literalAardvark 5 days ago
Comment by micromacrofoot 5 days ago
Comment by literalAardvark 5 days ago
And Fable is cracked. Way better than anything, and the biggest improvements are on the scariest subjects.
So given the state of the world at the moment, and the number of software patches we're barely keeping up with... I'm thankful that they're not making it worse.
Comment by kroaton 5 days ago
Comment by klmarks 5 days ago
"You see, Mythos can automatically break out of a VM running on SELinux, but unfortunately this is too dangerous and we had to implement guardrails for the Fable peasants."
Comment by zooming 5 days ago
Comment by LLLmmmBdS 5 days ago
Comment by olbeardGear 5 days ago
Comment by uihjhjb 5 days ago
Comment by nicechianti 5 days ago
Comment by pbgcp2026 5 days ago
Comment by bellowsgulch 5 days ago
Comment by simonw 5 days ago
(Admittedly it was buried pretty deep in that 300+ page PDF, but they did at least disclose it. If they hadn't I imagine it would have taken quite some time for the research community to figure out what was going on.)
Comment by afthonos 5 days ago
Comment by skavi 5 days ago
And to be clear, this isn't the safeguard where the model is explicitly downgraded to Opus, but rather where the Fable/Mythos model's "effectiveness" is transparently "limited" via "prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)".
[0]: https://web.archive.org/web/20260609173222/https://www.anthr...
[1]: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-...
Comment by afthonos 4 days ago
Comment by ajyoon 5 days ago
Comment by bellowsgulch 5 days ago
They could have simply told people "we do not permit using Claude models to perform frontier AI research," which is defensible from a policy point of view. This particular usage of their products requires no deception, nor hiding information prevent abuse.
However, instead, they chose for some reason to publicly display a morally poor way to execute a reasonable business decision (preventing abuse, defending your business interests, etc.)
Comment by afthonos 5 days ago
Comment by cyanydeez 5 days ago
Comment by bauldursdev 5 days ago
Comment by jarjoura 5 days ago
Maybe this is just a different set of people now realizing that Anthropic does this and has always done this?
Do not forget that this company is launching this thing at the moment it's trying to IPO. It's not rocket science that their very public steering/denial claim is really just them hinting to interested investors that their moat is absolute.
Comment by energy123 5 days ago
Comment by urbnspacecowboy 5 days ago
Questions like this are basically whataboutism, in effect even if not intent. https://en.wikipedia.org/wiki/Whataboutism
The question essentially assumes the premise that nobody complained about Anthropic's previous actions. In case you can't tell, I strongly reject this premise. People have been criticizing "safety" rhetoric from Anthropic and other LLM providers practically since the start. Remember Goody-2, the parody of excessively safety-tuned LLMs that refuses to do anything ever? That was released in February 2024, two years ago! (And it's still running, amazing. https://www.goody2.ai/chat )