• Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io
No Result
View All Result
Converge Digest
Friday, June 12, 2026
  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io
No Result
View All Result
Converge Digest
No Result
View All Result

Home » NVIDIA: Blackwell Ultra Cuts Inference Token Costs by Up to 35x vs. Hopper

NVIDIA: Blackwell Ultra Cuts Inference Token Costs by Up to 35x vs. Hopper

February 16, 2026
in AI Infrastructure
A A

NVIDIA’s Blackwell Ultra architecture is delivering major gains in inference economics for agentic AI workloads, according to new InferenceX data published by SemiAnalysis. In a February 16 blog post, SemiAnalysis author Ashraf Eassa reported that NVIDIA’s GB300 NVL72 systems provide up to 50x higher throughput per megawatt and as much as 35x lower cost per token versus the prior-generation Hopper platform. The findings focus on low-latency and long-context workloads such as AI coding agents and interactive assistants, which now account for roughly half of AI software programming queries, up from 11% last year, according to OpenRouter’s State of Inference report.

The SemiAnalysis data attributes the gains to a combination of Blackwell Ultra silicon advances and ongoing software stack optimization across TensorRT-LLM, Dynamo, Mooncake and SGLang. GB300 NVL72 integrates Blackwell Ultra GPUs with NVLink Symmetric Memory and optimized GPU kernels designed to minimize idle cycles through programmatic dependent launch. In low-latency inference scenarios, including multi-step agentic coding workflows, GB300 NVL72 delivers up to 35x lower cost per million tokens compared to Hopper. For long-context workloads—such as 128,000-token inputs with 8,000-token outputs—GB300 achieves up to 1.5x lower cost per token than GB200 NVL72, reflecting improvements in NVFP4 compute performance and faster attention processing.

Cloud providers are deploying the platform at scale. Microsoft, CoreWeave and Oracle Cloud Infrastructure are rolling out GB300 NVL72 systems for production inference targeting coding assistants and other agentic AI applications. SemiAnalysis reports that the improvements extend the momentum already seen with Blackwell deployments among inference providers such as Baseten, DeepInfra, Fireworks AI and Together AI, which cited up to 10x reductions in cost per token with earlier Blackwell systems.

• SemiAnalysis InferenceX data shows up to 50x higher throughput per megawatt for GB300 NVL72 vs. Hopper

• Up to 35x lower cost per million tokens for low-latency agentic AI workloads

• 1.5x lower cost per token vs. GB200 NVL72 for 128K-token long-context use cases

• Software optimizations in TensorRT-LLM and Dynamo deliver up to 5x performance gains on GB200 over four months

• NVFP4 compute improves 1.5x and attention processing doubles vs. prior generation

• GB300 NVL72 deployed by Microsoft, CoreWeave and Oracle Cloud Infrastructure for production inference

“As inference moves to the center of AI production, long-context performance and token efficiency become critical,” said Chen Goldberg, senior vice president of engineering at CoreWeave. “Grace Blackwell NVL72 addresses that challenge directly, and CoreWeave’s AI cloud, including CKS and SUNK, is designed to translate GB300 systems’ gains, building on the success of GB200, into predictable performance and cost efficiency. The result is better token economics and more usable inference for customers running workloads at scale.”

https://blogs.nvidia.com/blog/data-blackwell-ultra-performance-lower-cost-agentic-ai

🌐 Analysis: The SemiAnalysis data reinforces the shift in hyperscaler CapEx toward inference-optimized infrastructure as agentic AI and coding workloads expand. NVIDIA’s roadmap—from Hopper to Blackwell Ultra and the forthcoming Rubin architecture—positions throughput-per-megawatt and token economics as primary competitive metrics, an area where rivals including AMD and custom silicon efforts from hyperscalers are also intensifying focus.

🌐 We’re tracking the latest developments in networking silicon. Follow our ongoing coverage at: https://convergedigest.com/category/semiconductors/

🌐 We’re launching the “Data Center Networking for AI” series on NextGenInfra.io and inviting companies building real solutions—silicon, optics, fabrics, switches, software, orchestration—to share their views on video and in our expert report. To get involved, send a note to [email protected] or [email protected].

ShareTweetShareSummarizeSummarize
Previous Post

AMD and TCS Target 200MW AI in India with Helios Rack

Next Post

Tower and Scintil Roll Out First Heterogeneous DWDM Lasers 

Jim Carroll

Jim Carroll

Editor and Publisher, Converge! Network Digest, Optical Networks Daily - Covering the full stack of network convergence from Silicon Valley

Related Posts

AI Infrastructure

Australia’s Sharon AI Signs NVIDIA Deal for 40,000 GB300 GPUs 

June 12, 2026
Data Centers

Vertiv Adds ThermoKey Heat Rejection for Data Centers

June 12, 2026
AI Infrastructure

KKR Launches Helix Digital Infrastructure with $10B for AI Data Centers

June 11, 2026
AI Infrastructure

Oracle’s AI Infrastructure Business Drives 93% IaaS Growth

June 11, 2026
Clouds and Carriers

Nokia Brings MCP-Based Agentic AI to Multi-Vendor Operations

June 11, 2026
AI Infrastructure

Huawei Cloud Unveils Agentic AI Infrastructure Stack

June 11, 2026
Next Post

Tower and Scintil Roll Out First Heterogeneous DWDM Lasers 

Categories

  • 5G / 6G / Wi-Fi
  • AI Infrastructure
  • All
  • Automotive Networking
  • Blueprints
  • Clouds and Carriers
  • Data Centers
  • Enterprise
  • Explainer
  • Feature
  • Financials
  • Last Mile / Middle Mile
  • Legal / Regulatory
  • Optical
  • Quantum
  • Research
  • Security
  • Semiconductors
  • Space
  • Start-ups
  • Subsea
  • Sustainability
  • Video
  • Webinars

Archives

Tags

5G All AT&T Australia AWS Blueprint columns BroadbandWireless Broadcom China Ciena Cisco Data Centers Dell'Oro Ericsson FCC Financial Financials Huawei Infinera Intel Japan Juniper Last Mile Last Mille LTE Mergers and Acquisitions Mobile NFV Nokia Optical Packet Systems PacketVoice People Regulatory Satellite SDN Service Providers Silicon Silicon Valley StandardsWatch Storage TTP UK Verizon Wi-Fi
Converge Digest

A private dossier for networking and telecoms

Follow Us

  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io

© 2026 Converge Digest - A private dossier for networking and telecoms.

No Result
View All Result
  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io

© 2026 Converge Digest - A private dossier for networking and telecoms.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version