• Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io
No Result
View All Result
Converge Digest
Thursday, June 11, 2026
  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io
No Result
View All Result
Converge Digest
No Result
View All Result

Home » NVIDIA Launches Rubin CPX GPU for Long-Context AI Inference

NVIDIA Launches Rubin CPX GPU for Long-Context AI Inference

September 9, 2025
in All
A A

NVIDIA is targeting the rising complexity of AI inference with a new disaggregated infrastructure strategy and a purpose-built GPU for long-context workloads. Modern AI systems are evolving into agentic models with multi-step reasoning, persistent memory, and long-horizon context, which impose unprecedented compute, memory, and networking demands. Use cases such as full-codebase reasoning in software development, long-form video generation, and deep research require sustained coherence across millions of tokens—stressing current infrastructure.

The company’s SMART framework addresses this shift with a full-stack, disaggregated approach to inference, separating the compute-intensive context phase from the memory bandwidth-driven generation phase. By optimizing resources independently, NVIDIA aims to improve throughput, reduce latency, and enhance ROI. Its latest orchestration software, NVIDIA Dynamo, enables efficient management of KV cache transfers and routing, and has already demonstrated record MLPerf Inference results on GB200 NVL72.

To extend this model, NVIDIA introduced the Rubin CPX GPU, built to accelerate context-phase processing. Rubin CPX delivers 30 petaFLOPs of NVFP4 compute, 128 GB of GDDR7 memory, and 3× attention acceleration compared to GB300 NVL72, positioning it as a key enabler for long-context inference. At rack scale, the NVIDIA Vera Rubin NVL144 CPX integrates 144 Rubin CPX GPUs, 144 Rubin GPUs, and 36 Vera CPUs to achieve 8 exaFLOPs of compute, 100 TB of memory, and 1.7 PB/s bandwidth. With Quantum-X800 InfiniBand or Spectrum-X Ethernet, paired with ConnectX-9 SuperNICs, the system supports million-token inference workloads while promising 30–50× ROI—up to $5B in revenue from $100M in CAPEX.

• Inference is shifting to agentic models with persistent memory and multi-step reasoning.

• Disaggregated infrastructure separates context (compute-bound) from generation (memory-bound).

• Rubin CPX GPU: 30 petaFLOPs NVFP4, 128 GB GDDR7, 3× attention acceleration.

• Vera Rubin NVL144 CPX rack: 8 exaFLOPs compute, 100 TB memory, 1.7 PB/s bandwidth.

• Built to power million-token inference workloads for software, video, and research.

“Our Rubin CPX GPU and Vera Rubin NVL144 CPX rack set a new standard for context-aware inference, combining disaggregated architecture with full-stack orchestration to deliver unmatched efficiency and ROI,” NVIDIA stated.

🌐 Analysis: NVIDIA is addressing one of the most pressing bottlenecks in AI infrastructure—the challenge of long-context inference. By separating compute and memory paths, Rubin CPX enables hyperscalers to scale inference in a way that matches workload demands, much like how Blackwell optimized training economics. This move also preempts competition from startups pushing specialized inference accelerators and from hyperscalers exploring in-house silicon. With Rubin CPX, NVIDIA is strengthening its hold on both training and inference markets as AI models move into multi-million-token horizons.

🌐 We’re tracking the latest developments in AI infrastructure. Follow our ongoing coverage at: https://convergedigest.com/category/ai-infrastructure/

ShareTweetShareSummarizeSummarize
Previous Post

AT&T Bets $14B on Ericsson to Overhaul 5G Network Infrastructure

Next Post

AI Infrastructure Summit: NVIDIA’s Architectural Advancements

Jim Carroll

Jim Carroll

Editor and Publisher, Converge! Network Digest, Optical Networks Daily - Covering the full stack of network convergence from Silicon Valley

Related Posts

Optical

Colt and Ciena Achieve 800GbE Quantum-Safe Across the Atlantic

June 10, 2026
All

AWS Launches Graviton5 CPU with 192 Cores for Agentic AI

June 10, 2026
Research

Dell’Oro: AI Infrastructure Spending Pushes 2026 Data Center Capex Above $1 Trillion

June 10, 2026
Semiconductors

TDK Acquires Fabric8Labs to Scale Advanced Cooling for AI Data Centers

June 10, 2026
Quantum

Xanadu Sets New Benchmark for Ultra-Low-Loss Photonic Chip Packaging

June 10, 2026
Research

Dell’Oro: Campus Ethernet Switch Revenue Climbs in 1Q 2026

June 10, 2026
Next Post

AI Infrastructure Summit: NVIDIA’s Architectural Advancements

Categories

  • 5G / 6G / Wi-Fi
  • AI Infrastructure
  • All
  • Automotive Networking
  • Blueprints
  • Clouds and Carriers
  • Data Centers
  • Enterprise
  • Explainer
  • Feature
  • Financials
  • Last Mile / Middle Mile
  • Legal / Regulatory
  • Optical
  • Quantum
  • Research
  • Security
  • Semiconductors
  • Space
  • Start-ups
  • Subsea
  • Sustainability
  • Video
  • Webinars

Archives

Tags

5G All AT&T Australia AWS Blueprint columns BroadbandWireless Broadcom China Ciena Cisco Data Centers Dell'Oro Ericsson FCC Financial Financials Huawei Infinera Intel Japan Juniper Last Mile Last Mille LTE Mergers and Acquisitions Mobile NFV Nokia Optical Packet Systems PacketVoice People Regulatory Satellite SDN Service Providers Silicon Silicon Valley StandardsWatch Storage TTP UK Verizon Wi-Fi
Converge Digest

A private dossier for networking and telecoms

Follow Us

  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io

© 2026 Converge Digest - A private dossier for networking and telecoms.

No Result
View All Result
  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io

© 2026 Converge Digest - A private dossier for networking and telecoms.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version