NVIDIA kicked off its annual GPU Technology Conference (GTC) in San Jose, California, with a keynote from CEO Jensen Huang, unveiling a bold product roadmap set to redefine AI computing through the end of the decade. Hosted at the SAP Center, the event spotlighted the Blackwell Ultra GPU, the Vera Rubin platform, and future architectures, bolstered by cutting-edge networking and software innovations. Huang emphasized scaling up before scaling out, detailing how NVIDIA is pushing performance and efficiency to fuel the next era of AI factories. Top computer makers like Dell and cloud giants like AWS are already on board, signaling broad industry adoption.
Blackwell Ultra: A Mid-Cycle Powerhouse
NVIDIA confirmed that its Blackwell architecture, introduced at GTC 2024, is now in full production, with soaring demand. Blackwell Ultra, launching in the second half of 2025, will power the GB300 NVL72—a rack-scale system with 72 Blackwell Ultra GPUs and 36 Grace CPUs—delivering 1.5x the AI performance of the GB200 NVL72. The HGX B300 NVL16 platform promises 11x faster inference on large language models, 7x more compute, and 4x more memory (288GB of HBM3e) compared to Hopper. With 20 petaflops per chip, Blackwell Ultra is optimized for reasoning, agentic AI, and physical AI, including synthetic video generation for robotics.
The GB300 NVL72 leverages NVLink 72 to function as a single massive GPU rack, with an exaflop of compute and 18 NVLink switches distributed across nine trays. Huang described this liquid-cooled, disaggregated architecture as a fundamental departure from legacy air-cooled racks. Blackwell Ultra systems will be available through OEM partners like Cisco, Dell Technologies, HPE, Lenovo, and Supermicro, as well as cloud hyperscalers such as AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure. GPU cloud providers CoreWeave and Lambda will also offer instances.

Vera Rubin: The Next Big Leap in 2026
In 2026, NVIDIA will introduce the Vera Rubin platform, which pairs the next-gen Rubin GPU (R100) with the Vera CPU. This platform will double the performance of the Grace Blackwell superchip with eight stacks of HBM4 memory and a fully re-architected networking stack. Huang described Rubin as a “huge step up” for trillion-parameter AI models.
Rubin integrates NVLink 6 switches at 3600 GB/s, CX9 SuperNICs at 1600 Gb/s, and X1600 InfiniBand/Ethernet switches. The homogeneous system design will allow dynamic allocation of workloads across tensor, pipeline, and expert parallelism. The 50-watt Vera CPU will partner with Rubin GPUs to deliver 50 petaflops for inference, advancing the scale of future AI factories.

Rubin Ultra and Beyond: Extreme Scale-Up in 2027
NVIDIA also previewed Rubin Ultra, expected in the second half of 2027. Featuring NVLink 576 to connect up to 576 GPUs in a rack, Rubin Ultra is designed to deliver 15 exaflops of compute and 4.6 petabytes per second of bandwidth. The system, consisting of 2.5 million parts, will draw 600 kilowatts and house four GPUs per package, setting a new bar for AI rack density. Huang acknowledged the “slightly ridiculous” specs but framed them as a necessary leap to meet AI’s extreme compute demands.
Looking further ahead, NVIDIA teased the Feynman architecture for 2028, paired with the Vera CPU, continuing NVIDIA’s strategy of alternating yearly product releases and biennial architectural milestones.
Networking Innovations: Silicon Photonics and Spectrum-X
To support these AI factories, NVIDIA debuted its first co-packaged silicon photonics switches—featuring microring resonator modulators and TSMC’s 3D COUPE process—to eliminate traditional transceivers. The solution, with 1.6 Tbps per port, will ship as an InfiniBand variant later this year and a Spectrum-X Ethernet variant in 2026 with a 512-port radix. Huang noted these switches could save up to 60 megawatts per data center—equivalent to 100 Rubin Ultra racks.
The Spectrum-X Silicon Photonics Ethernet switch enhances NVIDIA’s networking portfolio with 3.5x power savings and 10x network resilience. Blackwell Ultra GPUs will connect via 800 Gb/s ConnectX-8 SuperNICs and BlueField-3 DPUs for secure, multi-tenant AI networks. The technology will help NVIDIA expand AI factories to hundreds of thousands of GPUs, building on deployments like the Colossus cluster.
Software and AI Factories: NVIDIA Dynamo and AI Enterprise
Complementing the hardware, NVIDIA unveiled Dynamo, an open-source AI factory operating system designed to manage complex parallel AI workloads, including reasoning and inference. Dynamo orchestrates tasks like disaggregating prefill and decode phases and optimizing token generation to maximize throughput and revenue potential. Early adopters like Perplexity.ai are already utilizing Dynamo to boost productivity.
Alongside Dynamo, NVIDIA AI Enterprise delivers enterprise-grade microservices (NIMs) and models such as Llama Nemotron Reason, enabling customers to deploy Blackwell Ultra across cloud and on-prem AI factories. Huang framed this as the infrastructure layer for building agent-based AI applications, which he likened to “the dynamo of the AI revolution.”
Implications for the Industry
NVIDIA’s roadmap blends scale-up innovations—such as NVLink and liquid-cooled racks—with scale-out technologies like silicon photonics and Spectrum-X Ethernet to underpin AI factories at unprecedented scale. Huang described inference as “the ultimate extreme computing problem,” linking AI factory profitability directly to efficient token generation. The Blackwell Ultra-based GB300 NVL72 and future Rubin and Rubin Ultra platforms are optimized to drive major advancements in agentic and physical AI, while software like Dynamo ensures orchestration scales in tandem.
With a detailed product and architecture roadmap extending through 2028, NVIDIA is positioning itself to meet the rapidly evolving demands of large-scale AI infrastructure and deployment.