At CES 2026, NVIDIA advanced its Rubin platform, a rack-scale AI architecture built around six tightly co-designed chips aimed at cutting training time and inference cost for large-scale AI models. The platform centers on the NVIDIA Vera CPU and Rubin GPU, linked through NVLink 6 and paired with ConnectX-9 SuperNICs, BlueField-4 DPUs, and Spectrum-6 Ethernet switches. NVIDIA positions Rubin as its next annual step beyond Blackwell, targeting agentic AI, long-context reasoning, and massive mixture-of-experts (MoE) models.
NVIDIA says Rubin delivers up to a 10x reduction in inference token cost and requires up to 4x fewer GPUs to train MoE models compared with Blackwell. The company also highlighted Spectrum-X Ethernet photonics systems, which it says provide 5x better power efficiency and improved uptime for AI fabrics. New AI-native storage capabilities, built around BlueField-4, aim to share and reuse inference context memory at scale, a growing requirement for multi-turn reasoning workloads.
NVLink 6 delivers 3.6TB/s of bidirectional bandwidth per GPU, a substantial jump over the previous generation, and scales to an aggregate 260TB/s of GPU-to-GPU bandwidth within a single NVL72 rack. NVIDIA emphasized that this bandwidth is paired with deterministic latency and full all-to-all connectivity, allowing large models to behave as if they are running on a single, massive accelerator rather than a loosely coupled cluster. The company also highlighted built-in in-network compute capabilities in the NVLink 6 switch to accelerate collective operations such as all-reduce, which are critical to distributed training and inference efficiency.
Beyond raw performance, NVIDIA framed NVLink 6 as a reliability and serviceability upgrade. The new NVLink switch architecture integrates tightly with Rubin’s second-generation RAS engine, enabling continuous health monitoring, fault isolation, and proactive remediation across GPUs, CPUs, and the interconnect itself. NVIDIA says the cable-free, modular tray design of the NVL72 rack—enabled in part by NVLink 6—supports up to 18x faster assembly and servicing compared with Blackwell-based systems, an increasingly important factor as AI factories scale to tens or hundreds of thousands of GPUs.
| Specification | NVLink 5 Blackwell | NVLink 6 Rubin |
|---|---|---|
| Supported architecture | NVIDIA Blackwell | NVIDIA Rubin platform |
| Max NVLink GPU domain | Up to 72 GPUs (NVL72) | Up to 72 GPUs (Vera Rubin NVL72) |
| GPU-to-GPU bandwidth (per GPU) | 1.8 TB/s bidirectional | 3.6 TB/s bidirectional |
| NVLink switch GPU-to-GPU bandwidth | 1,800 GB/s | 3,600 GB/s |
| Total aggregate NVLink bandwidth (NVL72) | 130 TB/s | 260 TB/s |
| Fabric behavior at rack scale | Scale-up NVSwitch fabric for NVL72 GPU domain | Non-blocking, fully connected all-to-all fabric across 72 GPUs |
| In-network compute (collectives acceleration) | — | Built-in in-network compute to speed collective operations |
| Positioning | Rack-scale scale-up fabric for Blackwell NVL72 systems | Rack-scale backbone for Vera Rubin NVL72 designed for MoE routing, synchronization-heavy training, and long-context inference |
The Rubin rollout comes with early commitments from hyperscalers and AI infrastructure providers. Microsoft plans to deploy Vera Rubin NVL72 rack-scale systems in its next-generation Fairwater AI superfactory sites, while CoreWeave expects to offer Rubin-based systems in the second half of 2026. AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure are among the first cloud platforms slated to bring Rubin instances online, alongside broad OEM and software ecosystem support.
NVIDIA also provided more new details on the Rubin platform’s integration into NVIDIA DGX SuperPOD, the company’s reference architecture for large-scale AI deployments. DGX SuperPOD remains the foundational design for deploying Rubin-based systems across enterprise, research, and cloud environments. In its largest configuration, DGX SuperPOD with DGX Vera Rubin NVL72 unifies eight NVL72 systems—576 Rubin GPUs in total—delivering up to 28.8 exaflops of FP4 performance and 600TB of fast memory. Each NVL72 system combines 36 Vera CPUs, 72 Rubin GPUs, and 18 BlueField-4 DPUs into a unified compute and memory domain. NVIDIA says the 260TB/s NVLink fabric within each rack allows the system to behave as a single AI engine, simplifying software design and improving utilization.
- Six-chip architecture: Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet
- Rack-scale systems: Vera Rubin NVL72 and HGX Rubin NVL8 for different deployment models
- NVLink 6: 3.6 TB/s per GPU and up to 260 TB/s per rack for large MoE and reasoning models
- AI-native storage: Inference Context Memory Storage Platform powered by BlueField-4
- Networking: Spectrum-X Ethernet photonics with co-packaged optics and 200G SerDes
Operational focus: Second-generation RAS engine with real-time health monitoring and faster servicing - Availability: Partner systems expected in the second half of 2026
“Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof,” said Jensen Huang, founder and CEO of NVIDIA. “With our annual cadence of delivering a new generation of AI supercomputers — and extreme codesign across six new chips — Rubin takes a giant leap toward the next frontier of AI.”
🌐 Analysis
Rubin and NVLink 6 underscore NVIDIA’s strategy of redefining the rack as the fundamental unit of AI compute. By combining extreme scale-up via NVLink with scale-out fabrics such as Spectrum-X Ethernet and Quantum-X800 InfiniBand, NVIDIA aims to address both intra-rack and inter-rack communication bottlenecks. As AI factories move toward hundreds of thousands of GPUs and gigawatt-scale power envelopes, the balance between proprietary scale-up fabrics and open Ethernet-based scale-out will shape competitive dynamics across hyperscalers and alternative accelerator platforms.





