No, it doesn't cost Anthropic $5k per Claude Code user
Posted by jnord 1 day ago
Comments
Comment by hirako2000 22 hours ago
It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.
That's why the difference between open router prices and those official providers isn't that different. Plus who knows what open routed providers do in term quantization. They may be getting 100x better efficiency, thus the competitive price.
That being said not all users max out their plan, so it's not like each user costs anthropic 5,000 USD. The hemoragy would be so brutal they would be out of business in months
Comment by jychang 22 hours ago
Opus isn't that expensive to host. Look at Amazon Bedrock's t/s numbers for Opus 4.5 vs other chinese models. They're around the same order of magnitude- which means that Opus has roughly the same amount of active params as the chinese models.
Also, you can select BF16 or Q8 providers on openrouter.
Comment by irthomasthomas 18 hours ago
Comment by F7F7F7 13 hours ago
Comment by aerhardt 15 hours ago
Comment by re-thc 20 hours ago
They do have different infrastructure / electricity costs and they might not run on nvidia hardware.
It's not just the models.
Comment by jychang 20 hours ago
Namely, Amazon Bedrock and Google Vertex.
That means normalized infrastructure costs, normalized electricity costs, and normalized hardware performance. Normalized inference software stack, even (most likely). It's about a close of a 1 to 1 comparison as you can get.
Both Amazon and Google serve Opus at roughly ~1/2 the speed of the chinese models. Note that they are not incentivized to slow down the serving of Opus or the chinese models! So that tells you the ratio of active params for Opus and for the chinese models.
Comment by Shakahs 18 hours ago
Comment by giancarlostoro 17 hours ago
Comment by re-thc 17 hours ago
We were responded about 10x not 0.5x.
x86 vs arm64 could have different performance. The Chinese models could be optimized for different hardware so it could show massive differences.
Comment by raggi 16 hours ago
Comment by fennecfoxy 19 hours ago
Comment by dryarzeg 18 hours ago
Comment by fennecfoxy 18 hours ago
Also with Nvidia you get the efficiency of everything (including inference) built on/for Cuda, even efforts to catch AMD up are still ongoing afaik.
I wouldn't be surprised if things like DS were trained and now hosted on Nvidia hardware.
Comment by re-thc 18 hours ago
They are. Nvidia makes A LOT of profit. Hey, top stock for a reason.
> I wouldn't be surprised if things like DS were trained and now hosted on Nvidia hardware
DS is "old". I wouldn't study them. The new 1s have a mandate to at least run on local hardware. There are data center requirements.
I agree it could still be trained on Nvidia GPUs (black market etc), but not running.
Comment by yorwba 17 hours ago
They do? Source?
But if that's true, it would explain why Minimax, Z.ai and Moonshot are all organized as Singaporean holding companies, with claimed data center locations (according to OpenRouter) in the US or Singapore and only the devs in China. Can't be forced to use inferior local hardware if you're just a body shop for a "foreign" AI company. ;)
Comment by re-thc 17 hours ago
They just have a China only endpoint and likely a company under a different name.
Nothing to do with AI. TikTok is similar (global vs China operations).
Comment by grayxu 16 hours ago
Comment by yorwba 16 hours ago
Comment by grumpoholic 15 hours ago
Comment by erichocean 15 hours ago
(Confirmation is faster than prediction.)
Many models architectures are specifically designed to make this efficient.
---
Separately, your statement is only true for the same gen hardware, interconnects, and quantization.
Comment by tom_m 2 hours ago
They're goal (similar to Uber, DoorDash, Robin Hood, etc.) is to get mass adoption. Their business models only work at this kind of scale.
It's completely impossible to have consumers pay $20-60/mo and be a profitable business without mass adoption where some are not using it as much as others...and, perhaps more importantly, the masses put pressure on their employers to pay for their tooling. This is why pricing does not need to come down.
Quite literally I have engineers spending over $1,000/mo on Opus. That's the goal.
Comment by Weaver_zhu 20 hours ago
Comment by jychang 20 hours ago
If Opus was 10x larger than the chinese models, then Google Vertex/Amazon Bedrock would serve it 10x slower than Deepseek/Kimi/etc.
That's not the case. They're in the same order of magnitude of speed.
Comment by Filligree 17 hours ago
It could still be 10x larger overall, though that would not make it 10x more expensive.
Comment by torginus 13 hours ago
Which seems to be the case, seeing how hungry the industry lately has been for hard drives.
Comment by bakugo 20 hours ago
According to OpenRouter, AWS serves the latest Opus and Sonnet at roughly the same speed. It's likely that they simply allocate hardware differently per model.
Comment by logicprog 16 hours ago
Comment by bakugo 20 hours ago
Comment by clbrmbr 9 hours ago
Comment by DanielHall 16 hours ago
Comment by yorwba 16 hours ago
Comment by Havoc 18 hours ago
The quantisation is shown on the provider section.
Comment by grayxu 17 hours ago
Comment by simianwords 22 hours ago
I find it a good comparison because it is a good baseline since we have zero insider knowledge of Anthropic. They give me an idea that a certain size of a model has a certain cost associated.
I don't buy the 10x efficiency thing: they are just lagging behind the performance of current SOTA models. They perform much worse than the current models while also costing much less - exactly what I would expect. Current Qwen models perform as good as Sonnet 3 I think. 2 years later when Chinese models catchup with enough distillation attacks, they would be as good as Sonnet 4.6 and still be profitable.
Comment by coldtea 17 hours ago
Define "much worse".
+--------------------------------------+-------------+-----------+------------------+
| Benchmark | Claude Opus | DeepSeek | DeepSeek vs Opus |
+--------------------------------------+-------------+-----------+------------------+
| SWE-Bench Verified (coding) | 80.9% | 73.1% | ~90% |
| MMLU (knowledge) | ~91 | ~88.5 | ~97% |
| GPQA (hard science reasoning) | ~79–80 | ~75–76 | ~95% |
| MATH-500 (math reasoning) | ~78 | ~90 | ~115% |
+--------------------------------------+-------------+-----------+------------------+Comment by Filligree 17 hours ago
Lots of models get really close on benchmarks, but benchmarks only tell us how good they are at solving a defined problem. Opus is far better at solving ill-defined ones.
Comment by ACCount37 15 hours ago
OpenAI can sometimes get an edge over Anthropic in hard narrow STEM tasks. I trust benchmarks over vibes there - and the benchmarks show the teams trading blows release after release. Tracking Claude Code vs OpenAI Codex on SWE-bench Verified feels like watching the back alley knife fight of the AI frontier.
But the vibe of "how easy is that model to interact with" and "how easy it is to get it to do what you want it to" does matter a lot when you are the one doing the interacting. And Opus makes for a damn good daily driver.
Comment by devonkelley 12 hours ago
Comment by torginus 13 hours ago
Comment by cmrdporcupine 14 hours ago
GLM5, the largest Qwen 3.5 model, and Kimi K2.5 are more fair comparisons, though they are, yes, a bit behind. They're more than capable for routine operations though.
Anyways, I'm back to using Opus & Claude Code after a month on Codex/GPT5.3 and 5.4 and it's frankly a rather obvious downgrade. Anthropic is behind OpenAI at this point on coding models, and there's nothing to say they couldn't fall behind the Chinese models as well.
The moat is very shallow. After the events of the last two weeks there's likely a significant % of international capital very interested in breaching it. I know I would like to see this... Anthropic basically said F U to any non-Americans, and OpenAI is ... yeah.
Comment by coldtea 17 hours ago
Ah, the "trust me bro" advantage. Couldn't it just be brand identity and familiarity?
Comment by vidarh 16 hours ago
My dashboard goes from all green to 50/50 green/red for our agents whenever I switch from Claude to one of the cheaper agents... This is after investing a substantial amount of effort in "dumbing down" the prompts - e.g. adding a lot of extra wording to convince the dumber models to actually follow instructions - that is not necessary for Sonnet or Opus.
I buy the benchmarks. The problem is that a 10% difference in the benchmarks makes the difference between barely usable and something that can consistently deliver working code unilaterally and require few review interventions. Basically, the starting point for "usable" on these benchmarks is already very far up the scale for a lot of tasks.
I do strongly believe the moat is narrow - With 4.6 I switched from defaulting to Opus to defaulting to Sonnet for most tasks. I can fully see myself moving substantial workloads to a future iteration of Kimi, Qwen or Deepseek in 6-12 months once they actually start approaching Sonnet 4.5 level. But for my use at least, currently, they're at best competing with Athropics 3.x models in terms of real-world ability.
That said, even now, I think if we were stuck with current models for 12 months, we might well also be able to build our way around this and get to a point where Deepseek and Kimi would be cheaper than Sonnet.
Eventually we'll converge on good enough harnesses to get away with cheaper models for most uses, and the remaining appeal for the frontier models will be complex planning and actual hard work.
Comment by oren1531 15 hours ago
Comment by vidarh 14 hours ago
Comment by Bombthecat 15 hours ago
Comment by vidarh 14 hours ago
My default model has now dropped to Sonnet, because Sonnet can now do most of my tasks, and we already use Kimi, Deepseek, and Qwen.
They're just not cost-effective enough to be my main driver yet. They are however cheap enough that for things where the Claude TOS does not let me use my subscription, they still add substantial value. Just not nearly as much as I'd like.
The bulk of my tasks won't get harder as time passes, and so will move down the value chain as the cheaper models get better.
For the small proportion of my tasks that benefits from a smarter model, I will use the smartest model I can afford.
Comment by cmb24k 6 hours ago
Comment by devonkelley 12 hours ago
Comment by vidarh 12 hours ago
But of course this is also only viable for non-latency sensitive work, for starters.
Comment by cesarvarela 15 hours ago
Comment by yorwba 16 hours ago
Comment by crooked-v 12 hours ago
I find it really funny that anyone can call it this with a straight face when all the American models are based on heaps of illegally pirated books and TOS-breaking website scraping in the first place.
Comment by lelanthran 22 hours ago
These are not cell phone plans which the average joe takes, they are plans purchased with the explicit goal of software development.
I would guess that 99 out of every 100 plans are purchased with the explicit goal of maxing them out.
Comment by serial_dev 21 hours ago
When I have a feeling that these tools will speed me up, I use them.
My client pays for a couple of these tools in an enterprise deal, and I suspect most of us on the team work like that.
If my goal was to max out every tool my client pays, I’d be working 24hrs a day and see no sunlight ever.
I guess it’s like the all you can eat buffet. Everybody eats a lot, but if you eat so much that you throw up and get sick, you are special.
Comment by bloppe 20 hours ago
Comment by Ginden 21 hours ago
Why? Because in my experience, the bottleneck is in shareholders approving new features, not my ability to dish out code.
Comment by raihansaputra 21 hours ago
if i hit the limit usually i'm not using it well and hunting around. if i'm using it right i'm basically gassed out trying to hit the limit to the max.
Comment by solumunus 21 hours ago
Comment by rustystump 21 hours ago
Comment by elbasti 12 hours ago
We have a way of determining if Anthropic is, or has the capability of being profitable, and what the levers to that may be. AI may be world-changing, but the accounting principles behind AI labs are no different than those behind a Pizza Hut.
Even if the cost of "inference + serving" is lower than the cost of selling a token, the relevant question is what is the depreciation schedule of the cost of training. ie, if I spend $1 on training, how long do I have before I have to spend $1 again?
Almost certainly, any reasonable depreciation schedule of the cost of training will result in leading labs being presently wildly unprofitable. So the question is:
What can be done to make training depreciate more slowly? Perhaps users can be persuaded to stick around using non-fronteir models for longer, although then there's a shift in the competitive landscape.
If users cannot be persuaded (forced?) to use legacy models, then the entire business model is thrown into question, because there's no reason why training frontier models would ever get cheaper: even if it gets cheaper on the margin, surely that will result in more compute used to generate an even "better" model, resulting in more spend in the aggregate.
This doesn't mean that the AI industry is "doomed". A couple things could happen, and this is where the fronteir labs should be focusing their attention:
1. They could find a way to climb up the value chain and capture more of the consumer surplus.
2. There could be a paradigm shift in compute architecture/compute cost.
3. We could reach a limit of marginal utility, shifting consumption to legacy models, thereby lengthening the depreciation/utility of training.
Edit: My assertion of "Almost certainly, any reasonable depreciation schedule of the cost of training will result in leading labs being presently wildly unprofitable." is made with no real information, just a gut feeling, and should not be taken seriously.
Comment by nr378 11 hours ago
However, the GAAP P&L tells the opposite story. You book $200M revenue in the same year you spend $1B training the next model, so you report an $800M loss. Next year you book $2B against $10B in training spend, reporting an $8B loss. The business looks like it's dying when every individual model generation actually generates a healthy profit.
That's actually Dario's answer to your depreciation question. If each cohort earns back its training cost within its natural lifespan (however short that lifespan is), the depreciation schedule is already baked in. The model doesn't need to live forever, it just needs to return more than it cost before the next one replaces it. Whether that's actually happening at Anthropic is a different question, and one we can't answer without audited financials, but it's the claim Dario makes (and seems entirely reasonable from a distance).
Comment by elbasti 11 hours ago
And I admit that I made that assertion from my gut without actually knowing if it's true or not.
Comment by kikimora 5 hours ago
To GAAP point - 200M or 1B or 10B is not a loss but cash converted into an asset. It won’t affect the bottom line at all. Unless the company re-evaluates the asset and say it now cost 1M instead of 200M. This would hit the bottom line.
Comment by calvinmorrison 11 hours ago
so what happens on year 10 when Anthropic hits a $10B training and only returns $8T? they're cooked
Comment by Verdex 11 hours ago
It's an interesting story about how even though all metrics show massive losses actually they have massive gains.
Accounting is a rather mature field, so I figure that someone in the past has tried this stunt and there should probably be ways for dealing with it.
Or do they always flame out after losing all the money? Knowing the history here would be informative.
Comment by Verdex 11 hours ago
Comment by stusmall 10 hours ago
The problem is everyone along the line is incentivized to be aggressive with estimate (commissions for sales are bigger, public financials looks better) and discouraged from correcting the estimates when they go wrong.
Estimating multi-year returns on frontier models looks harder than estimating returns on oil and gas projects in the 90s.
Comment by yunwal 10 hours ago
Comment by skybrian 11 hours ago
Comment by Avshalom 11 hours ago
He says "You paid $100 million and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume in this cartoonish cartoon example that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model is actually, in this example is actually profitable. What's going on is that at the same time"
importantly you'll notice that he's talking revenue, and assumes that inference is cheap enough/profitable enough that 100M + Inferance_Over_Lifetime < 200M
Comment by lovich 8 hours ago
Every single time a company comes around and goes "Actually GAAP are wrong, look at my new math that says were good" its led to much wailing and gnashing of teeth in the future when it inevitably isnt.
Comment by root_axis 8 hours ago
Yes, this is exactly why OpenAI and Anthropic are hyping AGI. If LLMs ever become good enough to replace workers, the first sign will be frontier model companies launching competitor businesses. It doesn't make sense to sell the formula for gold when you can just use it yourself.
> There could be a paradigm shift in compute architecture/compute cost.
Possible, but no signs of this on the horizon. If it does happen, it's impossible to predict when it will.
> We could reach a limit of marginal utility, shifting consumption to legacy models, thereby lengthening the depreciation/utility of training.
I'm not sure market dynamics will allow this any time soon. We seem to have already achieved a marginal utility equilibrium in terms of model size, so training new models on trending use-cases (e.g. synthetic data targeting tool calls, agentic workflows, computer use, etc) is really the driving force behind product differentiation. Nobody wants to admit "training new models isn't profitable" because that deflates the AGI singularity narrative that all this investment hinges on.
Comment by fritzo 11 hours ago
Comment by jchallis 11 hours ago
Comment by skybrian 11 hours ago
Maybe not? This is an argument that has to be made using numbers. We can't do the estimate without the numbers.
Comment by elbasti 11 hours ago
Comment by benlivengood 12 hours ago
Comment by freejazz 7 hours ago
Crazy that people can write sentences like this with a straight face these days.
Comment by lokar 12 hours ago
This is what the elites of the gilded age called "ruinous competition", and the solution today will be the same as back then: monopoly power. This has been the business plan of the tech VC industry for 25+ years.
Comment by lovich 8 hours ago
The models don't learn without training, and they have finite context windows. As software updates around the world, don't they have to be trained on the new information to stay up to date?
Comment by lokar 8 hours ago
It's partly about updating what it "knows", but more about keeping up with competitive pressure on capabilities.
Comment by freejazz 7 hours ago
Comment by lovich 8 hours ago
Maybe they can get to a “good enough” level where the next model isn’t 10x the price but if the business model requires ever increasing sizes to paper over the r&d costs from the previous set then I don’t understand how they would transition to profitability
Comment by overrun11 20 hours ago
This sloppy Forbes article has polluted the epistemic environment because now theres a source to point to as "evidence."
So yes this post author's estimation isn't perfect but it is far more rigorous than the original Forbes article which doesn't appear to even understand the difference between Anthropic's API costs and its compute costs.
Comment by mike_hearn 18 hours ago
The only thing these companies sell are tokens. That's their entire output. OpenAI is trying to build an ad business but it must be quite small still relative to selling tokens because I've not yet seen a single ad on ChatGPT. It's not like these firms have a huge side business selling Claude-themed baseball caps.
That means the cost of "inference" is all their costs combined. You can't just arbitrarily slice out anything inconvenient and say that's not a part of the cost of generating tokens. The research and training needed to create the models, the salaries of the people who do that, the salaries of the people who build all the serving infrastructure, the loss leader hardcore users - all of it is a part of the cost of generating each token served.
Some people look at the very different prices for serving open weights models and say, see, inference in general is cheap. But those costs are distorted by companies trying to buy mindshare by giving models away for free, and of those, both the top labs keep claiming the Chinese are distilling them like crazy including using many tactics to evade blocks! So apparently the cost of a model like DeepSeek is still partly being subsidized by OpenAI and Anthropic against their will. The cost of those tokens is higher than what's being charged, it's just being shifted onto someone else's books. Nice whilst it lasts, but this situation has been seen many times in the past and eventually people get tired of having costs externalized onto them.
For as long as firms are losing money whilst only selling tokens, that means those tokens are selling at a loss. To not sell tokens at a loss the companies would have to be profitable.
Comment by overrun11 17 hours ago
Comment by mike_hearn 16 hours ago
• Inference used for training? Modern training pipelines aren't just gradient descent, there's a ton of inference used in them too.
• Gradient descent itself?
• The CPUs and disks storing and managing the datasets?
• The web servers?
• The people paid to swap out failed components at the dc?
Let's say you try and define it to mean the same as unit economics - what does it cost you to add an additional customer vs what they bring in. There's still no way to do this calculation. It's like trying to compute the unit economics of a software company. Sure, if you ignore all the R&D costs of building the software in the first place and all the R&D costs of staying competitive with new versions, then the unit economics look amazing, but there's still plenty of loss-making software startups in the world.
Unit economics are a useful heuristic for businesses where there aren't any meaningful base costs required to stay in the game because they let you think about setup costs separately. Manufacturing toys, private education, farming... lots of businesses where your costs are totally dominated by unit economics. AI isn't like that.
Comment by overrun11 14 hours ago
> Does it include:
> Inference used for training? Modern training pipelines aren't just gradient descent, there's a ton of inference used in them too.
No because this is training and not inference. Just like how R&D costs for a drug aren't part of COGS either.
> Gradient descent itself?
No
> The CPUs and disks storing and managing the datasets?
Yes
> The web servers?
Yes
> The people paid to swap out failed components at the dc?
Yes to the extent they are swapping for inference and not training. If the same employees do both then the accountants will estimate what percent of their time is dedicated to each and adjust their cost accordingly.
Comment by mike_hearn 14 hours ago
For the rest, anyone can define and apply an accounting metric but that doesn't mean it tells you anything useful. If you look at the unit cost of any typical IP business it's nearly zero. Yet, many companies lose money on making movies, video games, apps and books.
Comment by torginus 12 hours ago
Comment by projektfu 11 hours ago
The API price should hopefully incorporate the capitalized cost of the hardware, the facility rent, the cost to train the model, the r&d, cost of sales, etc., to make it profitable.
Claude Code Max may be able to offer a good price by having a mix of higher and lower utilization of users and ignoring the fixed costs, treating it as a driver of API sales. But it doesn't make sense to essentially pay people to use it.
Comment by wasabi991011 13 hours ago
Comment by emtel 15 hours ago
When people say “selling at a loss” they mean negative unit economics. No one ever means this much more expansive definition you’ve invented.
Comment by landl0rd 15 hours ago
Comment by jeremyjh 16 hours ago
Comment by mike_hearn 15 hours ago
Comment by jeremyjh 6 hours ago
Comment by oneneptune 14 hours ago
Comment by trillic 12 hours ago
Comment by infecto 16 hours ago
Your right that all other costs are critical to measuring the profitability of the business but for such a young industry that’s the unknown. Does training get cheaper do we hit a theoretical limit on training. Are there further optimizations to be had.
You don’t have large capex in an industrial and then in year one argue that the business is doomed when your selling the product above the marginal cost but you have not recouped costs yet that have been capitalized.
Comment by howmayiannoyyou 18 hours ago
- Amortized training costs.
- SG&A.
- Capex depreciation.
All the above impact profitability over various time horizons and have to rolled into present and projected P&L and cash flow analysis.
Comment by ACCount37 16 hours ago
In part due to base model reuse and all the tricks like distillation. But mainly, due to how much inference the big providers happen to sell.
So, not the massive economic loss you'd need to push models away from being profitable. Capex and R&D take the cake there.
Comment by bodge5000 19 hours ago
Theres quite a lot of evidence, no proof I'd agree, but then there's no absolute proof I'm aware to the contrary either, so I don't know where you're getting this from.
The two pieces of evidence I'm aware of is that 1) Anthropic doesn't want their subsidised plans being used outside of CC, which would imply that the money their making off it isn't enough, and 2) last time I checked, API spending is capped at $5000 a month
Like I say, neither of these are proof, you can come up with reasonable arguments against them, but once again the same could be said for evidence on the contrary
Comment by Majromax 15 hours ago
Claude Code use-cases also differ somewhat from general API use, where the former is engineered for high cache utilization. We know from overall API costs (both Anthropic and OpenRouter) that cached inputs cost an order of magnitude less than uncached inputs, but OpenCode/pi/OpenClaw don't necessarily have the same kind of aggressive cache-use optimizations.
Vertically integrated stacks might also be able to have a first layer of globally shared KV cache for the system prompts, if the preamble is not user specific and changes rarely.
> 2) last time I checked, API spending is capped at $5000 a month
Per https://platform.claude.com/docs/en/api/rate-limits, that seems to only be true for general credit-funded accounts. If you contact Anthropic's sales team and set up monthly invoicing, there's evidently no fixed spending limit.
Comment by overrun11 17 hours ago
I don't think this logically follows. An unlimited buffet doesn't let you resell all of the food out the backdoor. At some level of usage any fixed price plan becomes unprofitable.
I agree the 5k cap is interesting as evidence although as you said I suspect there are other reasons for it.
As for evidence against it: The Information reported that OpenAI and Anthropic are 30%+ gross margins for the last few years. Sam Altman and Dario have both claimed inference is profitable in various scattered interviews. Other experts seem to generally agree too. A quick search found a tweet from former PyTorch team member Horace He: https://x.com/typedfemale/status/1961197802169798775 and a response to it in agreement from Anish Tondwalkar former researcher at OpenAI and Google Brain.
Comment by IsTom 16 hours ago
Comment by BoredomIsFun 18 hours ago
Comment by davewritescode 16 hours ago
I think it’s fairly obvious that Anthropic is lighting cash on fire and focusing on whether or not they’re losing money per token on inference is missing the forest for the trees.
Tokens become less valuable when the models aren’t continuously trained and we have zero idea what Anthropic is paying for training.
Comment by barrell 19 hours ago
Comment by infecto 16 hours ago
We don’t have clear evidence either way but it heavily leans to API pricing at least covering inference cost. Models these days have less and less differentiation and for API use there must be some thought to compete on cost but it’s not going to be winner take all. They leap frog each other with each new model.
Comment by bob1029 19 hours ago
Comment by pier25 13 hours ago
Comment by anonzzzies 21 hours ago
Comment by scandox 20 hours ago
Comment by codemog 20 hours ago
Comment by ffsm8 19 hours ago
I wanted to believe that you're essentially trolling, but no - that service exist. And not an upstart, there is coverage going back several years.
Our societies are seriously fucked.
Comment by remyp 14 hours ago
Effectively, this means that I have to hire a dog sitter every time I leave the house without her, just like an infant. If dog tv could fix this problem for me it would create an enormous amount of economic value.
Comment by ovi256 15 hours ago
Comment by dathinab 11 hours ago
kinda like Netflix and YT have "fireplace" streams or how LG TVs can be setup as "digital picture frames" when not "actively" used
but it being a dedicated service people pay money for is something new for me too
Comment by nemo44x 16 hours ago
Comment by 4ndrewl 16 hours ago
Comment by fennecfoxy 19 hours ago
Comment by blitzar 16 hours ago
Comment by lukan 20 hours ago
Comment by sva_ 18 hours ago
Comment by neamar 21 hours ago
Comment by behehebd 18 hours ago
Comment by tcbrah 16 hours ago
Comment by kleton 13 hours ago
model completions read write cached_read cache_write
claude-opus-4-6 11000 16900000 5840000 1312000000 66120000
Comment by tgrowazay 9 hours ago
$5x17+$25x6=$235 for Opus 4.6
$2x17+$12x6=$106 for Gemini 3 Pro
$0.60x17+$3.6x6=$31.80 for Qwen3.5 397B-A17B via Huggingface APIComment by jychang 21 hours ago
Ask Opus to figure out how much it would cost. Lol.
Comment by aweb 20 hours ago
Comment by ffsm8 19 hours ago
So getting Claude code subscriptions for developers should be permissable and not be against anything... However, if you created a rest endpoint to eg run a preconfigured prompt as part of your platform, that'd be against it
But I'm neither a lawyer nor work for anthropic
Comment by anonzzzies 19 hours ago
Comment by ValentineC 7 hours ago
Sorry, but could you clarify what this means?
Comment by ffsm8 2 hours ago
Expressed differently: are you an individual using a official anthropic application interactively? You're fine.
You're using it unattendedly, without an individual holding the reigns? You should probably talk with an lawyer wherever that's permissable.
Again, IANAL nor do I work for anthropic
Comment by alex_c 16 hours ago
Claude Code has a Teams plan which includes Max tiers. Why would it be forbidden?
Comment by ValentineC 7 hours ago
Could you quote the relevant part that you think forbids it for us?
Comment by sunaurus 19 hours ago
Comment by anonzzzies 19 hours ago
Comment by quikoa 20 hours ago
Comment by bloppe 20 hours ago
Comment by KptMarchewa 17 hours ago
Comment by bdangubic 16 hours ago
Comment by itintheory 14 hours ago
"If you click the thumbs up button to rate a chat, the AI provider will use the contents for training, so our company's policy is never to click the thumbs up button"
That seemed so farcical I had a hard time taking this person seriously. Enterprise plans must give some strong guarantees around data usage, right?
Comment by addandsubtract 12 hours ago
Comment by bdangubic 10 hours ago
Comment by ziml77 8 hours ago
Comment by bdangubic 12 hours ago
How many conpanes today don’t have “AI strategy” and are fearing will be left behind etc? In my small circle we went from “most are not using AI” to “none are not using AI” in somewhat short period of time
Comment by osener 18 hours ago
This is the relevant quote from the original article.
Comment by ranyume 15 hours ago
Comment by eaglelamp 22 hours ago
Anthropic's models may be similar in parameter size to model's on open router, but none of the others are in the headlines nearly as much (especially recently) so the comparison is extremely flawed.
The argument in this article is like comparing the cost of a Rolex to a random brand of mechanical watch based on gear count.
Comment by d1sxeyes 22 hours ago
Are Anthropic currently unable to sell subscriptions because they don’t have capacity?
Comment by mike_hearn 14 hours ago
Comment by d1sxeyes 11 hours ago
Doing a lot of heavy lifting here. Not everyone on a subscription plan would convert to a 200USD/mo API consumer.
Comment by eru 21 hours ago
Comment by MaxikCZ 21 hours ago
Absolutely! Im currently paying $170 to google to use Opus in antigravity without limit in full agent mode, because I tried Anthropic $20 subscription and busted my limit within a single prompt. Im not gonna pay them $200 only to find out I hit the limit after 20 or even 50 prompts.
And after 2 more months my price is going to double to over $300, and I still have no intention of even trying the 20x Max plan, if its really just 20x more prompts than Pro.
Comment by dtech 20 hours ago
Comment by MaxikCZ 20 hours ago
Comment by esrauch 20 hours ago
They have a business model and are trying to capture more revenue, fully saturating your computer isn't obviously a good business strategy.
Comment by cicko 19 hours ago
Comment by Aeolun 22 hours ago
Comment by bob1029 21 hours ago
I think it's the other way around? Sparse use of GPU farms should be the more expensive thing. Full saturation means that we can exploit batching effects throughout.
Comment by eaglelamp 13 hours ago
If you own equity in Anthropic you should care about that cost. Maybe you are willing to tolerate it to win market share, but for you to make the most profit you need that cost to shrink.
Comment by nottorp 20 hours ago
The entertainment industry. They still tell you about how much money they're leaving on the table because people pirate stuff.
What would happen in reality for entertainment is people would "consume" far less "content".
And what would happen in reality for Anthropic is people would start asking themselves if the unpredictability is worth the price. Or at best switch to pay as you go and use the API far less.
Comment by KronisLV 22 hours ago
Comment by eru 21 hours ago
Comment by the_gipsy 19 hours ago
Comment by NooneAtAll3 22 hours ago
I mean... rolex is overpriced brand whose cost to consumers is mainly just marketting in itself. Its production cost is nowhere close to selling price and looking at gears is fair way of evaluating that
Comment by fragmede 21 hours ago
When has production cost had anything to do with selling price?
Comment by eru 21 hours ago
Comment by YetAnotherNick 22 hours ago
Only thing that matters is if the users would have paid $5000 if they don't have option to buy subscription. And I highly doubt they would have.
Comment by ymaws 1 day ago
Comment by Bolwin 23 hours ago
I'm sure Anthropic is making money off the API but I highly doubt it's 90% profit margins.
Comment by jychang 22 hours ago
Unlikely. Amazon Bedrock serves Opus at 120tokens/sec.
If you want to estimate "the actual price to serve Opus", a good rough estimate is to find the price max(Deepseek, Qwen, Kimi, GLM) and multiply it by 2-3. That would be a pretty close guess to actual inference cost for Opus.
It's impossible for Opus to be something like 10x the active params as the chinese models. My guess is something around 50-100b active params, 800-1600b total params. I can be off by a factor of ~2, but I know I am not off by a factor of 10.
Comment by simianwords 21 hours ago
Comment by jychang 21 hours ago
Comparing tps ratios- by saying a model is roughly 2x faster or slower than another model- can tell you a lot about the active param count.
I won't say it'll tell you everything; I have no clue what optimizations Opus may have, which can range from native FP4 experts to spec decoding with MTP to whatever. But considering chinese models like Deepseek and GLM have MTP layers (no clue if Qwen 3.5 has MTP, I haven't checked since its release), and Kimi is native int4, I'm pretty confident that there is not a 10x difference between Opus and the chinese models. I would say there's roughly a 2x-3x difference between Opus 4.5/4.6 and the chinese models at most.
Comment by fc417fc802 20 hours ago
> Comparing tps ratios- by saying a model is roughly 2x faster or slower than another model- can tell you a lot about the active param count.
You sure about that? I thought you could shard between GPUs along layer boundaries during inference (but not training obviously). You just end up with an increasingly deep pipeline. So time to first token increases but aggregate tps also increases as you add additional hardware.
Comment by jychang 20 hours ago
Hint: what's in the kv cache when you start processing the 2nd token?
And that's called layer parallelism (as opposed to tensor parallelism). It allows you to run larger models (pooling vram across gpus) but does not allow you to run models faster.
Tensor parallelism DOES allow you to run models faster across multiple GPUs, but you're limited to how fast you can synchronize the all-reduce. And in general, models would have the same boost on the same hardware- so the chinese models would have the same perf multiplier as Opus.
Note that providers generally use tensor parallelism as much as they can, for all models. That usually means 8x or so.
In reality, tps ends up being a pretty good proxy for active param size when comparing different models at the same inference provider.
Comment by fc417fc802 19 hours ago
Comment by nbardy 21 hours ago
The Trillions of parameters claim is about the pretraining.
It’s most efficient in pre training to train the biggest models possible. You get sample efficiency increase for each parameter increase.
However those models end up very sparse and incredibly distillable.
And it’s way too expensive and slow to serve models that size so they are distilled down a lot.
Comment by wongarsu 19 hours ago
Since then inference pricing for new models has come down a lot, despite increasing pressure to be profitable. Opus 4.6 costs 1/3rd what Opus 4.0 (and 3.5) costs, and GPT 5.4 1/4th what o1 costs. You could take that as indication that inference costs have also come done by at least that degree.
My guess would have been that current frontier models like Opus are in the realm of 1T params with 32B active
Comment by aurareturn 22 hours ago
Comment by daemonologist 23 hours ago
Comment by johndough 22 hours ago
42 tps for Claude Opus 4.6 https://openrouter.ai/anthropic/claude-opus-4.6
143 tps for GLM 4.7 (32B active parameters) https://openrouter.ai/z-ai/glm-4.7
70 tps for Llama 3.3 70B (dense model) https://openrouter.ai/meta-llama/llama-3.3-70b-instruct
For GLM 4.7, that makes 143 * 32B = 4576B parameters per second, and for Llama 3.3, we get 70 * 70B = 4900B, which makes sense since denser models are easier to optimize. As a lower bound, we get 4576B / 42 ≈ 109B active parameters for Opus 4.6. (This makes the assumption that all three models use the same number of bits per parameter and run on the same hardware.)Comment by jychang 22 hours ago
I'd say Opus is roughly 2x to 3x the price of the top Chinese models to serve, in reality.
Comment by codemog 1 day ago
Comment by Chamix 22 hours ago
Of course, intense sparsification via MoE (and other techniques ;) ) lets total model size largely decouple from inference speed and cost (within the limit of world size via NVlink/TPU torrus caps)
So the real mystery, as always, is the actual parameter count of the activated head(s). You can do various speed benchmarks and TPS tracking across likely hardware fleets, and while an exact number is hard to compute, let me tell you, it is not 17B or anywhere in that particular OOM :)
Comparing Opus 4.6 or GPT 5.4 thinking or Gemini 3.1 pro to any sort Chinese model (on cost) is just totally disingenuous when China does NOT have Vera Rubin NVL72 GPUs or Ironwood V7 TPUs in any meaningful capacity, and is forced to target 8gpu Blackwell systems (and worse!) for deployment.
Comment by jychang 21 hours ago
Opus is 2T-3T in size at most.
Comment by Chamix 12 hours ago
You have presented a vibe-based rebuttal with no evidence or or logic to outline why you think labs are still stuck in the single trillions of parameters (GPT 4 was ~1 trillion params!). Though, you have successfully cunninghammed me into saying that while anything I publicly state is derived from public info, working in the industry itself is a helpful guide to point at the right public info to reference.
Comment by johndough 11 hours ago
> and while an exact number is hard to compute, let me tell you, it is not 17B or anywhere in that particular OOM :)
I can see ~100B, but that would near the same order of magnitude. I find ~1000B active parameters hard to believe.
Comment by Chamix 10 hours ago
4o and other H100 era models did indeed drop their activated heads far smaller than gpt-4 to the 10s just like current Hopper-Era Chinese open-source, but it went right back up again post-Blackwell with the 10x L2 bump (for kv cache) in congruence with nlogn attention mechanisms being refined. Similar story for Claude.
The fun speculation is wondering about the true size of Gemini 3's internals, given the petabyte+ world size of their homefield IronwoodV7 systems and Jim Keller's public penchant for envisioning extreme MoE-like diversification across hundreds of dedicated sub-models constructed by individual teams within DeepMind.
Comment by johndough 19 hours ago
Comment by magicalhippo 18 hours ago
From my understanding, the "besides training" is a big issue. As I noted earlier[1], Qwen3 was much better than Qwen2.5, but the main difference was just more and better training data. The Qwen3.5-397B-A17B beat their 1T-parameter Qwen3-Max-Base, again a large change was more and better training data.
Comment by aurareturn 22 hours ago
Comment by Chamix 22 hours ago
However, I'd say its relatively well assumed in realpolitik land that Chinese labs managed to acquire plenty of H100/200 clusters and even meaningful numbers of B200 systems semi-illicitly before the regulations and anti-smuggling measures really started to crack down.
This does somewhat beg the question of how nicely the closed source variants, of undisclosed parameter counts, fit within the 1.1tb of H200 or 1.5tb of B200 systems.
Comment by aurareturn 21 hours ago
Comment by Chamix 21 hours ago
Comment by aurareturn 21 hours ago
Comment by 0xbadcafebee 14 hours ago
That said, for inference, the margins for OpenAI were estimated at 70% [1] [2], and the margins for Anthropic were estimated between 90% and 40% [3] [4], last year. They will not be profitable for years.
[1] https://phemex.com/news/article/openais-ai-profit-margin-cli... [2] https://www.saastr.com/have-ai-gross-margins-really-turned-t... [3] https://www.theinformation.com/articles/anthropic-projects-7... [4] https://www.investing.com/news/stock-market-news/anthropic-t...
Comment by vessenes 13 hours ago
Profit implies a GAAP accrual of some sort. On any accrual schedule tied to reality, the companies are profitable now - that is, inference margin on each given model has more than paid for capital costs of training and deploying those models.
That the companies get to show a loss is a feature of cash-basis accounting: they made $100m net on that last model? Good news, We’re spending $1b on the next! Infinite tax losses!
The companies will not be cashflow positive for years. Why does this persnickety difference matter? It matters to me because I care about the engineers here - and they seem collectively likely to either short every AI company IPOing, or just quietly ignore AI impact on their livelihood, or head off into a corner and go catatonic - all based on a worldview that “this is collective insanity and everything here is going to eventually go bankrupt” — none of those are good outcomes. Shorting might be, but it should be done judiciously, and understanding the financial factors at play. So, anyway, long plea over - but, allow me to plead: cashflow positive if you want to make the point you were making.
Comment by nickcoffee 1 hour ago
The more interesting question is where the margins go as inference costs keep dropping. At some point the pricing pressure flows to users.
Comment by himata4113 21 hours ago
If you remove the cached token cost from pricing the overall api usage drops from around $5000 to $800 (or $200 per week) on the $200 max subscription. Still 4x cheaper over API, but not costing money either - if I had to guess it's break even as the compute is most likely going idle otherwise.
Comment by mike_hearn 18 hours ago
The gamble with caching is to hold a KV cache in the hope that the user will (a) submit a prompt that can use it and (b) that will get routed to the right server which (c) won't be so busy at the time it can't handle the request. KV caches aren't small so if you lose that bet you've lost money (basically, the opportunity cost of using that RAM for something else).
Comment by otterley 15 hours ago
Comment by mike_hearn 15 hours ago
https://developers.openai.com/api/docs/guides/prompt-caching...
> When using the in-memory policy, cached prefixes generally remain active for 5 to 10 minutes of inactivity, up to a maximum of one hour. In-memory cached prefixes are only held within volatile GPU memory.
You can opt-in to storing the caches on local disk but it's not the default. I haven't done the calculations for why they do this, but given that disaggregated parallel prefill and RDMA can recompute the KV cache very fast, you'd need a huge amount of bandwidth from disk to beat it (and flash drives wear out!).
Comment by criemen 19 hours ago
I'm incredibly salty about this - they're essentially monetizing intensely something that allows them to sell their inference at premium prices to more users - without any caching, they'd have much less capacity available.
Comment by eru 21 hours ago
Why would it go idle? It would go to their next best use. At least they could help with model training or let their researchers run experiments etc.
Comment by himata4113 21 hours ago
Training currently requires nvidia's latest and greatest for the best models (they also use google TPU's now which are also technically the latest and greatest? However, they're more of a dual purpose than anything afaik so that would be a correct assesment in that case)
Inference can run on a hot potato if you really put your mind to it
Comment by eru 21 hours ago
I am not saying this would be a great use of their compute, but idle is far from the only alternative. (Unless electricity is the binding constraint?)
Comment by himata4113 21 hours ago
Comment by eru 19 hours ago
Huh, what? You know you can turn off unused equipment, and at least my nvidia GPU can use more or less Watts even when turned on?
Or does Anthropic have a flatline deal for electricity and cooling?
Comment by rafaelmn 19 hours ago
Comment by agenthustler 16 hours ago
Comment by z3ugma 1 day ago
Comment by lovecg 23 hours ago
Comment by crakhamster01 21 hours ago
Maybe the common factor here is not having deep/sufficient knowledge on the topic being discussed? For the article I mentioned, I feel like I was less focused on the strength of the writing and more on just understanding the content.
LLMs are very capable at simplifying concepts and meeting the reader at their level. Personally, I subscribe to the philosophy of - "if you couldn't be bothered to write it, I shouldn't bother to read it".
Comment by ajkjk 20 hours ago
Comment by amonith 18 hours ago
I just don't know what's supposed to be natural writing anymore. It's not in the books, disappears from the internet, what's left? Some old blogs for now maybe.
Comment by crakhamster01 13 hours ago
But luckily there's a large body of well written books/blogs/talks/speeches out there. Also anecdotally, I feel like a lot of the "bad writing" I see online these days is usually in the tech sphere.
Comment by juuular 14 hours ago
Comment by weird-eye-issue 23 hours ago
Comment by lovecg 23 hours ago
“what X actually is”
“the X reality check”
Overuse of “real” and “genuine”:
> The real story is actually in the article. … And the real issue for Cursor … They have real "brand awareness", and they are genuinely better than the cheaper open weights models - for now at least. It's a real conundrum for them.
> … - these are genuinely massive expenses that dwarf inference costs.
This style just screams “Claude” to me.
Comment by hansvm 23 hours ago
Comment by lelanthran 22 hours ago
It has enough tells in the correct frequency for me to consider it more than 50% generated.
Comment by NetOpWibby 23 hours ago
Comment by raincole 19 hours ago
Comment by Erem 23 hours ago
Comment by 152334H 23 hours ago
Popular content is popular because it is above the threshold for average detection.
In a better world, platforms would empower defenders, by granting skilled human noticers flagging priority, and by adopting basic classifiers like Pangram.
Unfortunately, mainstream platforms have thus far not demonstrated strong interest in banning AI slop. This site in particular has actually taken moderation actions to unflag AI slop, in certain occasions...
Comment by rhubarbtree 21 hours ago
Comment by weird-eye-issue 21 hours ago
Comment by rhubarbtree 6 hours ago
Comment by weird-eye-issue 3 hours ago
Comment by faangguyindia 20 hours ago
And APIs are on-demand service equivalant.
Priority is set to APIs and leftover compute is used by Subscription Plans.
When there is no capacity, subscriptions are routed to Highly Quantized cheaper models behind the scenes.
Selling subscription makes it cheaper to run such inference at scale otherwise many times your capacity is just sitting there idle.
Also, these subscription help you train your model further on predictable workflow (because the model creators also controls the Client like qwen code, claude code, anti gravity etc...)
This is probably why they will ban you for violating TOS that you cannot use their subscription service model with other tools.
They aren't just selling subscription, but the subscription cost also help them become better at the thing they are selling which is coding for coding models like Qwen, Claude etc...
I've used qwen code, codex and claude.
Codex is 2x better than Qwen code and Claude is 2x better than Codex.
So I'd hope the Claude Opus is atleast 4-5x more expensive to run than flagship Qwen Code model hosted by Alibaba.
Comment by popcorncowboy 20 hours ago
This hasn't been true in a long time.
Comment by epolanski 19 hours ago
In fact I'm more and more inclined to run my own benchmarks from now on, because I seriously distrust those I see online.
Even if the benchmarks are indeed valid, they just don't reflect my use cases, usages and ability to navigate my projects and my dependencies.
Comment by Huppie 19 hours ago
Maybe that's just CLAUDE.md and memory causing the difference of course.
As a matter of preference however I like the way Claude Code works just a lot better, instructing it to work with parallel subagents in work trees etc. just matches the way I think these things should work I guess.
Comment by elAhmo 20 hours ago
Comment by janalsncm 19 hours ago
Have they announced this?
Comment by nl 19 hours ago
No and indeed they have said they never do this at all.
Comment by sieabahlpark 20 hours ago
Comment by n_u 23 hours ago
1. It would be nice to define terms like RSI or at least link to a definition.
2. I found the graph difficult to read. It's a computer font that is made to look hand-drawn and it's a bit low resolution. With some googling I'm guessing the words in parentheses are the clouds the model is running on. You could make that a bit more clear.
Comment by brianjeong 1 day ago
Comment by skybrian 23 hours ago
Comment by jeff_antseed 21 hours ago
anthropic doesn't have that. single provider, single pricing decision. whether or not $5k is accurate the more interesting question is what happens to inference pricing when the supply side is genuinely open. we're seeing hints of it with open router but its still intermediated
not saying this solves anthropic's cost problem, just that the "what does inference actually cost" question gets a lot more interesting when providers are competing directly
Comment by aurareturn 23 hours ago
I thought there was no moat in AI? Even being 10x costlier, Anthropic still doesn't have enough compute to meet demand.
Those "AI has no moat" opinions are going to be so wrong so soon.
Comment by spiderice 22 hours ago
So no, Claude would not be getting NEARLY as much usage as it's currently getting if it weren't for the $100/$200 monthly subscription. You're comparing Kimi to the price that most people aren't paying.
Comment by aurareturn 42 minutes ago
Comment by jdjfnfndn 19 hours ago
Comment by dreis_sw 12 hours ago
If they never go public, there's our answer as well.
Comment by readthemanual 17 hours ago
Comment by martinald 17 hours ago
Comment by functionmouse 1 day ago
Comment by versteegen 22 hours ago
[1] https://www.wheresyoured.at/anthropic-is-bleeding-out/ [2] https://www.wheresyoured.at/costs/
Comment by sunaurus 19 hours ago
> this company is wilfully burning 200% to 3000% of each Pro or Max customer that interacts with Claude Code
There is of course this meme that "Anthropic would be profitable today if they stopped training new models and only focused on inference", but people on HN are smart enough to understand that this is not realistic due to model drift, and also due to comeptition from other models. So training is forever a part of the cost of doing business, until we have some fundamental changes in the underlying technology.
I can only interpret Ed Zitron as saying "the cost of doing business is 200% to 3000% of the price users are paying for their subscriptions", which sounds extremely plausible to me.
Comment by simianwords 22 hours ago
Comment by beepbooptheory 14 hours ago
Like I wish it was simple as "if it wasn't viable, they wouldn't be in business," but alas that argument is kinda the more naive one in this world. Right?
Or is there some intuition about energy/cost here all the dump posters miss, that you could tell us about?
Please, anything, my company is dying.
Comment by simianwords 11 hours ago
Its so strange to see people still think costs are not going down..
Comment by dimgl 1 day ago
Comment by crazygringo 1 day ago
> My LinkedIn and Twitter feeds are full of screenshots from the recent Forbes article on Cursor claiming that Anthropic's $200/month Claude Code Max plan can consume $5,000 in compute.
Comment by fulafel 22 hours ago
So the article's title is obviously sensationalized.
Comment by vidarh 21 hours ago
Also, while Opus certainly is a lot better than even the best Chinese models, when I max out my Claude plan, I make do with Kimi 2.5. When factoring in the re-run of changes because of the lower quality, I'd spend maybe 2x as much per unit of work I were to pay token prices for all my monthly use w/Kimi.
I'd still prefer Claude if the price comes down to 1x, as it's less hassle w/the harder changes, but their lead is effectively less than a year.
Comment by crazygringo 15 hours ago
The title does not seem sensationalized. It's literally a summary of the article.
Comment by fulafel 13 hours ago
The title is refuting a strawman argument that wasn't actually made, and that the article itself doesn't claim was made.
Comment by crazygringo 12 hours ago
The argument was literally made in Forbes. It's linked to. What are you on about?
Is there something I'm missing here?
Comment by fulafel 12 hours ago
Which is different from actaully costing 5k in tokens per Claude Code user. As users won't max out their subscriptions. And there doesn't seem to be any stronger claim elsewhere in the article.
But the title is about a strawman that it would cost Anthropic 5k per user which it seems nobody claimed.
Comment by crazygringo 11 hours ago
But headlines are short. This is so common even in mainstream news, I can't really complain about it. Especially when the full claim with "up to" is printed in the very first paragraph.
And the entire point of the article is not about which users max out their subscriptions. It's about conflating retail prices with actual costs.
So maybe the headline would be more accurate with "up to" in it, but the article itself is totally fine, and does not hinge on that distinction. The article is certainly not about a strawman.
Comment by ineedaj0b 20 hours ago
…You could take efficiency improvement rates from previous models releases (from x -> y) and assume; they have already made “improvements” internally. This is likely closer to what their real costs are.
Comment by WhitneyLand 15 hours ago
Cursor seems to be in a tough spot. Just heard the swix podcast on their big new cloud agents thing, and it’s looking like a pretty small moat these days.
Comment by hattmall 23 hours ago
Comment by tom_m 2 hours ago
Comment by A7OM 17 hours ago
Comment by gmerc 1 day ago
Comment by arthurcolle 23 hours ago
Comment by rs_rs_rs_rs_rs 22 hours ago
Comment by maxdo 16 hours ago
Comment by tartoran 15 hours ago
Comment by maxdo 13 hours ago
Yeah , I tried gasTown. Not using it extensively.
Comment by tartoran 15 hours ago
Comment by amelius 15 hours ago
Comment by tartoran 14 hours ago
Comment by vbezhenar 18 hours ago
Comment by hobofan 18 hours ago
API inference access is naturally a lot more costly to provide compared to Chat UI and Claude Code, as there is a lot more load to handle with less latency. In the products they can just smooth over load curves by handling some of the requests slower (which the majority of users in a background Code session won't even notice).
Comment by timmmmmmay 12 hours ago
now, the consensus of the commentards on this website, who don't have access to any of anthropics financial data, is that the monthly subscriptions are a money loser!
so either the leading AI company's business dev team is wrong or the Jacker News comment section is wrong, it is a mystery
Comment by preommr 16 hours ago
Comment by akhrail1996 19 hours ago
I wonder if a better proxy would be comparing by capability level rather than size. The cost to go from "good" to "frontier" is probably exponential, not linear - so estimating Anthropic's real cost from what it takes to serve Qwen 397B seems off.
Comment by scuff3d 22 hours ago
Alibaba is the primary comparison point made by the author, but it's a completely unsuitable comparison. Alibab is closer to AWS then Anthropic in terms of their business model. They make money selling infrastructure, not on inference. It's entirely possible they see inference as a loss leader, and are willing to offer it at cost or below to drive people into the platform.
We also have absolutely no idea if it's anywhere near comparable to Opus 4.6. The author is guessing.
So the articles primary argument is based on a comparison to a company who has an entirely different business model running a model that the author is just making wild guesses about.
Comment by simianwords 21 hours ago
Comment by ajstars 14 hours ago
Comment by steveBK123 15 hours ago
In the real world ..
Where I work, AI is used heavily, we are already tipping into cost management mode at a firm level. Users are being aggressively steered to cheaper models, usage throttled, and cost attribution reports sent. This is already being done at the under-$1k/mo per user cost level. So some indications of revenue per user leveling out already.
Meanwhile everyone I know who works anywhere near a computer has had AI shoved down their throat, with training, usage KPIs, annual goal setting and mandated engagement. So we are already pretty saturated, it's not like theres giant new frontiers of new users.
Comment by behehebd 18 hours ago
Comment by vmykyt 18 hours ago
People in comments have assumption that Atropic 10 times bigger than chinese models so calc cost is 10 times more.
But from perspective of Big O notation only a few algorithms gives you O(N). Majority high optimized things provide O(N*Log(N))
So what is big O for any open model for single request?
Comment by fancyfredbot 18 hours ago
However I think it's fair to say the cost is roughly linear in the number of users other than that.
There may be some aspects which are not quite linear when you see multiple users submitting similar queries... But I don't think this would be significant.
Comment by rat9988 18 hours ago
As for LLM, there is probably some cost constant added once it can fit on a single GPU, but should probably be almost linear.
Comment by darkwater 21 hours ago
Comment by otterley 15 hours ago
Comment by ramesh31 15 hours ago
Comment by zurfer 16 hours ago
Which is probably a lot more correct than other claims. However it's also true that anybody who has to use the API might pay that much, creating a real cost per token moat for Anthropics Claude code vs other models as long as they are so far ahead in terms of productivity.
Comment by d--b 18 hours ago
$200 worth of actual computation is an awful lot of computation.
Comment by lyu07282 19 hours ago
Comment by sheepscreek 13 hours ago
> Qwen 3.5 397B-A17B is a good comparison point. It's a large MoE model, broadly comparable in architecture size to what Opus 4.6 is likely to be.
I stopped reading here. Frontier models have been rumoured to be in TRILLIONS of parameters since the days of GPT-4. Besides, with agents, I think they’re using more specialized models under the hood for certain tasks like exploration and web searches.
So while their cost won’t be $5000 or anywhere close, I still think it would be in the hundreds for heavy users. They may very well be losing money to the top 5-10% MAX users. Their real margin likely comes from business API customers.
Here’s an interesting bit - OpenAI filed a document with the SEC recently that gave us a peek into its finances. The cost of all infrastructure stood at just ~30% of all revenue generated. That is a phenomenal improvement. I fell off the chair when I first learned that.
Comment by bhekanik 14 hours ago
Comment by amelius 20 hours ago
> Anthropic is looking at approximately $500 in real compute cost for the heaviest users.
Comment by beepbooptheory 1 day ago
Comment by scriptsmith 23 hours ago
Comment by oefrha 23 hours ago
Comment by beepbooptheory 14 hours ago
Comment by arthurcolle 23 hours ago
but $5 that I amortize over 7 years might end up being $1.7 maybe if I don't rapidly combust (supply chain risk)
Comment by dimava 20 hours ago
Everyone else pays them at API prices
Comment by fnord77 23 hours ago
Aren't they losing money on the retail API pricing, too?
> ... comparisons to artificially low priced Chinese providers...
Yeah, no this article does not pass the sniff test.
Comment by versteegen 22 hours ago
No, they aren't, and probably neither is anyone else offering API pricing. And Anthropic's API margins may be higher than anyone else.
For example, DeepSeek released numbers showing that R1 was served at approximately "a cost profit margin of 545%" (meaning 82% of revenue is profit), see my comment https://news.ycombinator.com/item?id=46663852
Comment by bandrami 21 hours ago
Comment by vidarh 21 hours ago
Comment by aurareturn 20 hours ago
Comment by bandrami 19 hours ago
Eh. We don't really know that, and the people saying that have an interest in the rest of the world believing it's true.
Comment by aurareturn 18 hours ago
Comment by bandrami 15 hours ago
Comment by aurareturn 10 hours ago
Comment by secondary_op 15 hours ago
Comment by AussieWog93 7 hours ago
In my case, the access logs alone from bots scanning for vulns grew so large that the server started creating.
Fortunately I wasn't running anything vulnerable!
Comment by notkyle 15 hours ago
Comment by dr_dshiv 19 hours ago
It’s worth it, but I know they aren’t making money on me. But, of course I’m marketing them constantly so…