Nvidia Nemotron 3 Family of Models
Posted by ewt-nv 1 day ago
Comments
Comment by wcallahan 1 day ago
As someone else mentioned, the GPT-OSS models are also quite good (though I haven’t found how to make them great yet, though I think they might age well like the Llama 3 models did and get better with time!).
But for a defined task, I’ve found task compliance, understanding, and tool call success rates to be some of the highest on these Nvidia models.
For example, I have a continuous job that evaluates if the data for a startup company on aVenture.vc could have overlapping/conflated two similar but unrelated companies for news articles, research details, investment rounds, etc… which is a token hungry ETL task! And I recently retested this workflow on the top 15 or so models today with <125b parameters, and the Nvidia models were among the best performing for this type of work, particularly around non-hallucination if given adequate grounding.
Also, re: cost - I run local inference on several machines that run continuously, in addition to routing through OpenRouter and the frontier providers, and was pleasantly surprised to find that if I’m a paying customer of OpenRouter otherwise, the free variant there from Nvidia is quite generous for limits, too.
Comment by kgeist 7 hours ago
I recently pitted gpt-oss 120b against Qwen3-Next 80b on a lot of internal benchmarks (for production use), and for me, gpt-oss was slightly slower (vLLM, both fit in VRAM), much worse at multilingual tasks (33 languages evaluated), and had worse instruction following (e.g., Qwen3-Next was able to reuse the same prompts I used for Gemma3 perfectly, while gpt-oss struggled and RAG benchmarks suddenly went from 90% to 60% without additional prompt engineering).
And that's with Qwen3-Next being a random unofficial 4-bit quant (compared to gpt-oss having native support) + I had to disable multi-token prediction in Qwen3-Next because vLLM crashed with it.
Has someone here tried both gpt-oss 120b and Qwen3-Next 80b? Maybe I was doing something wrong because I've seen a lot of people praise gpt-oss.
Comment by scrlk 5 hours ago
> We trained the models on a mostly English, text-only dataset, with a focus on STEM, coding, and general knowledge.
Comment by andy99 8 hours ago
Comment by woodson 7 hours ago
Comment by danielmarkbruce 2 hours ago
Comment by btown 10 hours ago
Comment by heavyset_go 12 minutes ago
Comment by red2awn 1 day ago
* Hybrid MoE: 2-3x faster than pure MoE transformers
* 1M context length
* Trained on NVFP4
* Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...)
* Open model training recipe (coming soon)
Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0.
Also interesting that the model is trained in NVFP4 but the inference weights are FP8.
Comment by bcatanzaro 17 hours ago
Comment by sosodev 9 hours ago
I've noticed that open models have made huge efficiency gains in the past several months. Some amount of that is explainable as architectural improvements but it seems quite obvious that a huge portion of the gains come from the heavy use of synthetic training data.
In this case roughly 33% of the training tokens are synthetically generated by a mix of other open weight models. I wonder if this trend is sustainable or if it might lead to model collapse as some have predicted. I suspect that the proliferation of synthetic data throughout open weight models has lead to a lot of the ChatGPT writing style replication (many bullet points, em dashes, it's not X but actually Y, etc).
Comment by pants2 1 day ago
However, this looks like it has great potential for cost-effectiveness. As of today it's free to use over API on OpenRouter, so a bit unclear what it'll cost when it's not free, but free is free!
Comment by viraptor 1 day ago
That's temporary. Cerebras speeds up everything, so if Nemotron is good quality, it's just a matter of time until they add it.
Comment by credit_guy 1 day ago
Comment by agentastic 7 hours ago
Nemotron on the other hand is a hybrid (Transformer + Mamba-2) so it will be more challenging to compile it on Cerebras/Groq chips.
(Me thinks Nvidia is purposefully picking architecture+FP4 that is easy to ship on Nvidia chips, but harder for TPU or Cerebras/Groq to deploy)
Comment by ofermend 6 hours ago
It scores at 9.6% hallucination rate, similar to qwen3-next-80b-a3b-thinking (9.3%) but of course it is much smaller.
Comment by max002 23 hours ago
Comment by kristopolous 9 hours ago
I'm guessing there's some sophistication in the instrumentation I'm just not up to date with.
Comment by DoctorOetker 6 hours ago
Comment by kristianp 10 hours ago
Comment by shikon7 10 hours ago
Comment by jtbayly 10 hours ago
Comment by mark_l_watson 8 hours ago
Today I ran a few simple cases on Ollama, but not much real testing.
Comment by axoltl 9 hours ago
But if you're OK running it without a UI wrapper, mlx_lm==0.30.0 will serve you fine.
Comment by anon373839 5 hours ago
Comment by keyle 4 hours ago
https://lmstudio.ai/models/nemotron-3
Simplest to just install it from the app.
Comment by netghost 9 hours ago
> Nemotron 3 Nano is a 3.2B active (3.6B with embeddings) 31.6B total parameter model.
So I don't know the exact math once you have a MoE, but 3.2b will run on most anything, 31.6b and you're looking at needing a pretty large amount of ram.
Comment by vessenes 8 hours ago
Comment by jonrosner 7 hours ago
Comment by sosodev 9 hours ago
However, is cost the biggest limiting factor for agent adoption at this point? I would suspect that the much harder part is just creating an agent that yields meaningful results.
Comment by ineedasername 2 hours ago
Comment by all2 8 hours ago