Particle: Nvidia’s Blackwell Ultra Delivers 50× Efficiency Gain for Agentic AI

Overview

New data from Nvidia reports GB300 NVL72 achieves up to 50× higher throughput per megawatt and 35× lower cost per token versus Hopper for low‑latency workloads.
For long‑context tasks, GB300 NVL72 delivers up to 1.5× lower cost per token than GB200 NVL72, with 1.5× higher NVFP4 compute and 2× faster attention.
Blackwell Ultra links 72 GPUs in a unified NVLink fabric reported at about 130 TB/s, a design aimed at boosting throughput and scaling long‑context inference.
Microsoft, CoreWeave and Oracle Cloud Infrastructure are deploying GB300 NVL72 in production, while Signal65 and SemiAnalysis report large gains including over 1.1 million tokens per second on a single rack.
Nvidia previews the Rubin/Vera Rubin platform as a next step, projecting up to 10× higher throughput per megawatt for MoE inference compared with Blackwell.