NVIDIA Launches Rubin CPX GPU for Long-Context AI Inference

NVIDIA is targeting the rising complexity of AI inference with a new disaggregated infrastructure strategy and a purpose-built GPU for long-context workloads. Modern AI systems are evolving into agentic models with multi-step reasoning, persistent memory, and long-horizon context, which impose unprecedented compute, memory, and networking demands. Use cases such as full-codebase reasoning in software development, long-form video generation, and deep research require sustained coherence across millions of tokens—stressing current infrastructure.

The company’s SMART framework addresses this shift with a full-stack, disaggregated approach to inference, separating the compute-intensive context phase from the memory bandwidth-driven generation phase. By optimizing resources independently, NVIDIA aims to improve throughput, reduce latency, and enhance ROI. Its latest orchestration software, NVIDIA Dynamo, enables efficient management of KV cache transfers and routing, and has already demonstrated record MLPerf Inference results on GB200 NVL72.

To extend this model, NVIDIA introduced the Rubin CPX GPU, built to accelerate context-phase processing. Rubin CPX delivers 30 petaFLOPs of NVFP4 compute, 128 GB of GDDR7 memory, and 3× attention acceleration compared to GB300 NVL72, positioning it as a key enabler for long-context inference. At rack scale, the NVIDIA Vera Rubin NVL144 CPX integrates 144 Rubin CPX GPUs, 144 Rubin GPUs, and 36 Vera CPUs to achieve 8 exaFLOPs of compute, 100 TB of memory, and 1.7 PB/s bandwidth. With Quantum-X800 InfiniBand or Spectrum-X Ethernet, paired with ConnectX-9 SuperNICs, the system supports million-token inference workloads while promising 30–50× ROI—up to $5B in revenue from $100M in CAPEX.

• Inference is shifting to agentic models with persistent memory and multi-step reasoning.

• Disaggregated infrastructure separates context (compute-bound) from generation (memory-bound).

• Rubin CPX GPU: 30 petaFLOPs NVFP4, 128 GB GDDR7, 3× attention acceleration.

• Vera Rubin NVL144 CPX rack: 8 exaFLOPs compute, 100 TB memory, 1.7 PB/s bandwidth.

• Built to power million-token inference workloads for software, video, and research.

“Our Rubin CPX GPU and Vera Rubin NVL144 CPX rack set a new standard for context-aware inference, combining disaggregated architecture with full-stack orchestration to deliver unmatched efficiency and ROI,” NVIDIA stated.

🌐 Analysis: NVIDIA is addressing one of the most pressing bottlenecks in AI infrastructure—the challenge of long-context inference. By separating compute and memory paths, Rubin CPX enables hyperscalers to scale inference in a way that matches workload demands, much like how Blackwell optimized training economics. This move also preempts competition from startups pushing specialized inference accelerators and from hyperscalers exploring in-house silicon. With Rubin CPX, NVIDIA is strengthening its hold on both training and inference markets as AI models move into multi-million-token horizons.

🌐 We’re tracking the latest developments in AI infrastructure. Follow our ongoing coverage at: https://convergedigest.com/category/ai-infrastructure/

NVIDIA Launches Rubin CPX GPU for Long-Context AI Inference

AT&T Bets $14B on Ericsson to Overhaul 5G Network Infrastructure

AI Infrastructure Summit: NVIDIA’s Architectural Advancements

Jim Carroll

Related Posts

Colt and Ciena Achieve 800GbE Quantum-Safe Across the Atlantic

AWS Launches Graviton5 CPU with 192 Cores for Agentic AI

Dell’Oro: AI Infrastructure Spending Pushes 2026 Data Center Capex Above $1 Trillion

TDK Acquires Fabric8Labs to Scale Advanced Cooling for AI Data Centers

Xanadu Sets New Benchmark for Ultra-Low-Loss Photonic Chip Packaging

Dell’Oro: Campus Ethernet Switch Revenue Climbs in 1Q 2026

AI Infrastructure Summit: NVIDIA’s Architectural Advancements

Categories

Archives

NVIDIA Launches Rubin CPX GPU for Long-Context AI Inference

AT&T Bets $14B on Ericsson to Overhaul 5G Network Infrastructure

AI Infrastructure Summit: NVIDIA’s Architectural Advancements

Related Posts

Categories

Archives

Tags