Cohere's First Model for Developers
Posted by hmokiguess 5 days ago
Comments
Comment by amunozo 1 day ago
Comment by namr2000 16 hours ago
Based on [2] a 30B model needs something like 2e+23 FLOPS to train from scratch whereas a 1.6T model needs something like 1e+27 FLOPs to train. So DeepSeek v4 Pro was roughly 5000x more expensive to train than this model. I'm not totally sure how MOE affects scaling laws, so these numbers might be different in reality, but it gives you a good ballpark estimate of the difference in training scale.
[1] https://arxiv.org/abs/2505.12781 [2] https://arxiv.org/abs/2203.15556
Comment by amunozo 5 hours ago
Comment by matt_daemon 1 day ago
Cool to see this but seems like it would be pretty expensive to run
Comment by anon373839 1 day ago
Comment by ltononro 1 day ago
Comment by yencabulator 17 hours ago
Comment by montroser 20 hours ago
But yeah, it's not the best look to have to stretch and say it's "competitive" with other models in it's weight class, when it offers not much else that's useful or novel.
Comment by moojacob 1 day ago
More competition is better.
Comment by SubiculumCode 1 day ago
Comment by sipjca 1 day ago
Comment by bitwize 1 day ago
Comment by doodlesdev 1 day ago
Comment by bitwize 14 hours ago
Comment by AbuAssar 22 hours ago
Comment by mkl 22 hours ago
Comment by zuzululu 1 day ago
Comment by greyb 1 day ago
It's being kept alive because the Canadian government is desperate to have a local frontier lab and is willing to inject funding and force its adoption in government services, but leadership at Cohere is known to be weak in Canadian tech circles, and they pivoting to an enterprise-first market around production RAG rather than anything close to frontier work.
I'm glad they're doing open weight releases but they're not viable in the long-run. It is embarrassing sharing similar spaces with them, but I'll try this release out in OpenCode and re-think afterwards.
Comment by daijj 18 hours ago
Comment by zuzululu 16 hours ago
Comment by suddenlybananas 1 day ago
Comment by moralestapia 23 hours ago
It’s truly embarrassing how much hand-holding those guys have received from angels, investors, the government, etc. To the point where the same investors they’re going to pitch to are preparing their slides, telling them what to say during the presentation, and then approving them for even more funding afterward, lol.
That government part is corruption and illegal, by the way.
Actual usage on many of their APIs/models is painfully low, like in ... hundreds of DAUs. I don't blame them for this, but this is a "company" that should have died 2 years ago.
Comment by chartpath 21 hours ago
Comment by osti 20 hours ago
Comment by zuzululu 18 hours ago
- the wife of a professor I knew in canada apparently makes 400k/year for some Aboriginal art gallery that gets like two visits a year. They kicked out small businesses in that building so they could have a 6000 sq ft for an art gallery that sits empty with the weirdest "art" that nobody has heard of.
- canadian coworker said around 2020 there was like 3 developers that charged the Canadian government $70 million for some flutter app that had ONE screen to check in and out of places due to quarantine and it didn't even work.
- ten million here and there to raise diversity and LGBTQ in African countries that don't even have running water or electricity and other brow raising spend of tax money
- a founder raised money for a SaaS but was shut down by the Canadian government after not being issued license. Same exact SaaS was funded and the person running it had political connections.
A country with 8 times smaller population and 15 times smaller economy than USA somehow has 7 times more tax employees. It's unclear what the end game is for Canada.
Comment by moralestapia 9 hours ago
I met one of these guys. Some parts of Canada are massively corrupt. If you're in the inner circle, the amount of things you get for free is unimaginable. If you're not, then you get the privilege of a 50%+ tax bill.
Comment by zuzululu 6 hours ago
Comment by dismalaf 17 hours ago
The Liberals can never lose an election because the amount of people who rely on them for handouts outnumber the people who work in private enterprises and don't get handouts.
It's the Argentinian Peronist strategy...
Comment by kadoban 1 day ago
Comment by redwood 1 day ago
Comment by chattermate 21 hours ago
Comment by cyanydeez 3 days ago
Comment by lumost 1 day ago
Comment by stymaar 1 day ago
Comment by amunozo 1 day ago
Comment by SubiculumCode 1 day ago
Comment by daemonologist 1 day ago
Regular Qwen 3.6 benchmarks slightly better and has much wider software support though, so this is probably of interest only to organizations which disallow models trained in China.
Comment by kadoban 1 day ago
30B vs 35B isn't nothing either.
If it ends up just being some tweaks to someone else's weights, then meh.
Comment by mtone 1 day ago