DeepSeek v4 Pro 1.6T model post-trained by Huawei on 1000 Ascend 910C chips
Posted by theanonymousone 1 day ago
Comments
Comment by wg0 22 hours ago
As if OpenAI and Anthropic are giving us ball to ball commentary on how their training runs go. Deepseek did train it on domestic hardware, model might be out in public soon (open weights or not) and then anyone can see what is it about.
Comment by mlsu 21 hours ago
AI workloads are very simple and massively parallel. Energy into the workload, trained artifact out. The bigger the artifact the more parallel you have to be and hence the more energy you have to use. (the SW engineering to make it possible is difficult but fundamentally tractable).
Because of this, it is possible for China to train competitive models even at a fraction of the power efficiency of the advanced USA chips. Energy is the issue more and more as the models get larger.
We have a serious problem with the cost of energy in the US. AI dominance is far from guaranteed. I DGAF but the policymakers who seem to care about this are not doing much to fix the situation. Like in EVs and solar, we are going to start getting lapped. Instead of export controls in the USA->China direction, we're going to start seeing import controls in the China->USA direction, for inference.