Nemotron-Cascade 2 is a 30B Mixture-of-Experts model that only activates 3B parameters per forward pass. It’s the second open-weight model to reach gold-medal level on IMO 2025, IOI 2025, and ICPC World Finals - after DeepSeek at 671B, which is more than 20x the size.
The core technique is Cascade RL: sequential domain-by-domain reinforcement learning, where each domain (math, code, instruction following, SWE agents) gets its own hyperparameters without destabilizing the others. The novel addition is Multi-Domain On-Policy Distillation (MOPD): when a Cascade RL stage causes regression on other benchmarks, they distill from the best intermediate teacher model for that domain on the fly. On AIME 2025, MOPD reached teacher-level performance in 30 steps; GRPO hit only 91.0 after the same number of steps.
The trade-off is explicit: it beats Qwen3.5-35B-A3B (a comparable-size model) on math, code, and alignment, but lags on knowledge benchmarks (MMLU-Pro, GPQA) and agentic tasks. NVIDIA attributes this to weaker pretraining and less agentic RL coverage.
Weights, SFT data, and RL data are all open.