Overview
- New data from Nvidia reports GB300 NVL72 achieves up to 50× higher throughput per megawatt and 35× lower cost per token versus Hopper for low‑latency workloads.
- For long‑context tasks, GB300 NVL72 delivers up to 1.5× lower cost per token than GB200 NVL72, with 1.5× higher NVFP4 compute and 2× faster attention.
- Blackwell Ultra links 72 GPUs in a unified NVLink fabric reported at about 130 TB/s, a design aimed at boosting throughput and scaling long‑context inference.
- Microsoft, CoreWeave and Oracle Cloud Infrastructure are deploying GB300 NVL72 in production, while Signal65 and SemiAnalysis report large gains including over 1.1 million tokens per second on a single rack.
- Nvidia previews the Rubin/Vera Rubin platform as a next step, projecting up to 10× higher throughput per megawatt for MoE inference compared with Blackwell.