Particle.news

Nvidia’s Blackwell Ultra Delivers 50× Efficiency Gain for Agentic AI

Nvidia credits a 72‑GPU NVLink fabric with NVFP4 precision, faster attention processing, plus tuned inference libraries for lower token costs at low latency.

Overview

  • New data from Nvidia reports GB300 NVL72 achieves up to 50× higher throughput per megawatt and 35× lower cost per token versus Hopper for low‑latency workloads.
  • For long‑context tasks, GB300 NVL72 delivers up to 1.5× lower cost per token than GB200 NVL72, with 1.5× higher NVFP4 compute and 2× faster attention.
  • Blackwell Ultra links 72 GPUs in a unified NVLink fabric reported at about 130 TB/s, a design aimed at boosting throughput and scaling long‑context inference.
  • Microsoft, CoreWeave and Oracle Cloud Infrastructure are deploying GB300 NVL72 in production, while Signal65 and SemiAnalysis report large gains including over 1.1 million tokens per second on a single rack.
  • Nvidia previews the Rubin/Vera Rubin platform as a next step, projecting up to 10× higher throughput per megawatt for MoE inference compared with Blackwell.