Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
NVIDIA has released Nemotron 3 Super, a 120B (12B active) open weights reasoning model that scores 36 on the Artificial Analysis Intelligence Index with a hybrid Mamba-Transformer MoE architecture
We were given access to this model ahead of launch and evaluated it across intelligence, openness, and inference efficiency.
Key takeaways
➤ Combines high openness with strong intelligence: Nemotron 3 Super performs strongly for its size and is substantially more intelligent than any other model with comparable openness
➤ Nemotron 3 Super scored 36 on the Artificial Analysis Intelligence Index, +17 points ahead of the previous Super release and +12 points from Nemotron 3 Nano. Compared to models in a similar size category, this places it ahead of gpt-oss-120b (33), but behind the recently-released Qwen3.5 122B A10B (42).
➤ Focused on efficient intelligence: we found Nemotron 3 Super to have higher intelligence than gpt-oss-120b while enabling ~10% higher throughput per GPU in a simple but realistic load test
➤ Supported today for fast serverless inference: providers including @DeepInfra and @LightningAI are serving this model at launch with speeds of up to 484 tokens per second
Model details
📝 Nemotron 3 Super has 120.6B total and 12.7B active parameters, along with a 1 million token context window and hybrid reasoning support. It is published with open weights and a permissive license, alongside open training data and methodology disclosure
📐 The model has several design features enabling efficient inference, including using hybrid Mamba-Transformer and LatentMoE architectures, multi-token prediction, and NVFP4 quantized weights
🎯 NVIDIA pre-trained Nemotron 3 Super in (mostly) NVFP4 precision, but moved to BF16 for post-training. Our evaluation scores use the BF16 weights
🧠 We benchmarked Nemotron 3 Super in its highest-effort reasoning mode ("regular"), the most capable of the model's three inference modes (reasoning-off, low-effort, and regular)

NVIDIA released significant pre- and post-training data alongside new comprehensive training recipes for this model. These disclosures reach an 83 on the Artificial Analysis Openness Index, behind only highly-open models from Ai2 and MBZUAI, and place Nemotron 3 Super in the most attractive quadrant for Openness and Intelligence among peers.
Nemotron 3 Super is by far the most intelligent model ever released with this level of openness.

Nemotron 3 Super used a relatively high number of tokens across our evaluations. It used 110M output tokens to run the Artificial Analysis Intelligence Index evaluations - this is around 40% more than gpt-oss-120b with high reasoning effort, but a ~20% reduction compared to Nemotron 3 Nano.
That’s significantly fewer tokens than Anthropic’s Claude Opus 4.6 (max), which used 160M tokens, and slightly fewer than OpenAI’s GPT-5.4 (xhigh), which used 120M tokens.

At 120B total with 12B active parameters, Nemotron 3 Super is still relatively small compared to other recent open weights model releases from top global labs — GLM-5 (744B total, 40B active), Qwen3.5 397B A17B (397B total, 17B active), and Kimi K2.5 (1T total, 32B active) are each 3x to 8x larger.

NVIDIA is focused on efficient intelligence for the Nemotron family, and we tested inference performance against peer models to see the impact of the architecture choices.
We ran self-hosted throughput tests across a range of peer models using a simple methodology with workloads representative of common use cases such as agentic workflows with moderate history, RAG applications, or document processing.
In this test, Nemotron 3 Super (NVFP4) shows 11% higher throughput per NVIDIA B200 GPU than gpt-oss-120b (MXFP4), placing Nemotron 3 Super ‘up and to the right’ relative to gpt-oss-120b. Qwen3.5 122B A10B achieves +6 points on the Intelligence Index compared to Nemotron 3 Super, but at 40% lower throughput per GPU.
Our Intelligence Index scores for Nemotron 3 Super were evaluated on the BF16 weights. We have not yet assessed whether there is any intelligence impact of NVFP4 quantization, but NVIDIA’s internal testing found that the NVFP4 model achieved 99.8% median accuracy relative to the BF16 baseline.
For more details of our testing setup and model configurations, see our article on Nemotron 3 Super:


Nemotron 3 Super will be available from its release on serverless APIs from providers including Lightning AI and DeepInfra.
We tested these endpoints and see performance up to 484 tokens per second on our standard 10k token input workloads.
At launch, Nemotron 3 Super sits in the most attractive quadrant for intelligence and output speed among comparable peers.

7.6K
Top
Ranking
Favorites
