Converge Digest

Q&A: UALink 2.0, In-Network Compute, and the Future of Open AI Interconnects

Kurtis Bowman, Chairman, UALink Consortium

AI infrastructure is entering a new phase defined by massive GPU clusters, memory bandwidth constraints, and the need for efficient scale-up interconnects. The UALink Consortium recently introduced its latest specifications, including in-network compute, chiplet integration, and 200G data link advancements.

In this extended discussion, Kurtis Bowman explains how UALink is evolving as an open alternative to proprietary interconnects and what it means for next-generation AI systems.


Q: What fundamental shift in AI infrastructure is driving the need for UALink?

NextGenInfra.io: AI clusters are scaling rapidly. What problem does UALink solve at a system level?

Kurtis Bowman: The industry is hitting a major inflection point driven by exponential growth in AI clusters. A few years ago, systems had two to four GPUs. Now we’re talking about rack-scale, row-scale, and even data center-scale deployments.

At that scale, performance is dictated by memory bandwidth. If GPUs can’t access data fast enough, they sit idle. Unlike CPUs, we can’t just increase clock speeds or memory channels due to the constraints of high-bandwidth memory (HBM).

So the real challenge becomes: how do you interconnect GPUs efficiently? UALink addresses that by providing a purpose-built, memory-centric interconnect optimized for scale-up AI workloads.  


Q: How does UALink compare to NVLink, Ethernet, and other interconnects?

NextGenInfra.io: Where does UALink fit relative to existing technologies like NVLink, Ethernet, PCIe, and CXL?

Kurtis Bowman: UALink is essentially the open equivalent of NVLink. It’s designed specifically for scale-up environments, where low latency and high bandwidth are critical.

The key difference is openness. With UALink, you can mix and match CPU vendors, accelerator vendors, and switch vendors. That creates a competitive, multi-vendor ecosystem.

We also leverage an Ethernet-based PHY, which allows us to use widely available components and keep costs competitive. But unlike Ethernet, UALink is not packet-based—it supports load/store and atomic operations directly, which is essential for memory-centric workloads.

Ethernet, PCIe, and CXL still play important roles, especially for scale-out and host connectivity, but they don’t deliver the same efficiency for tightly coupled GPU clusters.  


Q: What are the key innovations in the UALink 2.0 specifications?

NextGenInfra.io: The new release includes multiple specifications. What are the most important changes?

Kurtis Bowman: There are four major updates:

This modular approach allows us to evolve different parts of the stack independently, which is critical given how fast the industry is moving.  


Q: What is in-network compute, and why does it matter?

NextGenInfra.io: In-network compute is a major addition. How does it work in practice?

Kurtis Bowman: In-network compute moves certain operations from the GPU into the switch.

For example, instead of a GPU requesting data from every other GPU in a cluster, it can send a single request to the switch. The switch collects the data, performs the computation—like a reduction—and sends back the result.

This dramatically reduces the number of traversals across the interconnect. Instead of multiple round trips, you get a much more efficient flow.

In terms of performance, we’re seeing improvements in the range of 15% to 20% for certain workloads.  


Q: Does this create a new class of AI switches?

NextGenInfra.io: Are we looking at fundamentally new switch architectures?

Kurtis Bowman: Yes, to some extent. These switches incorporate compute capabilities, including arithmetic units for certain operations and data movement engines for others.

They support collective operations like broadcast, reduce, and multicast—similar to what NVIDIA does with SHARP.

So you can think of this as an evolution of the switch into a more active participant in computation, rather than just a transport element.


Q: What scale are we talking about for UALink fabrics?

NextGenInfra.io: What does a typical UALink deployment look like in terms of radix and scale?

Kurtis Bowman: Current designs are targeting switches with 256 to 512 lanes.

With redundancy—typically two lanes per connection—you’re looking at around 128 to 256 GPUs per switch. For very large models, you can push higher densities with fewer redundant links.

The focus is on tightly coupled scale-up environments where latency and bandwidth are critical.  


Q: Will UALink switches be hybrid with Ethernet?

NextGenInfra.io: Do you expect hybrid UALink/Ethernet switches?

Kurtis Bowman: Some will be hybrid, but others will be dedicated.

It comes down to cost and power. Ethernet requires significantly more silicon—about three to four times more—which increases power consumption.

So some vendors will prioritize flexibility, while others will optimize for efficiency with dedicated UALink switches.  


Q: How does UALink scale beyond 200G?

NextGenInfra.io: What’s the roadmap for higher speeds?

Kurtis Bowman: By separating the physical layer from the protocol stack, we allow the PHY to evolve independently.

We’re at 200G per lane today, but we can move to 400G, 800G, and beyond without requiring software changes.

We’re also working with industry groups and MSAs to stay aligned with emerging standards and adopt faster signaling as it becomes available.  


Q: How does the chiplet model accelerate adoption?

NextGenInfra.io: What role do chiplets play in the UALink ecosystem?

Kurtis Bowman: Chiplets simplify integration and accelerate time-to-market.

An accelerator vendor can focus on its core compute design and add UALink connectivity through standardized chiplets. These chiplets can be sourced from IP providers and integrated alongside the main die.

They also allow different process nodes—for example, advanced nodes for compute and more mature nodes for interconnect—which improves cost efficiency and design flexibility.  


Q: How do you view competition from NVLink, Ethernet, and emerging standards?

NextGenInfra.io: There are multiple competing approaches—NVLink, Ethernet-based scale-up, and others. How does this play out?

Kurtis Bowman: That’s typical in emerging markets. We’ve seen similar fragmentation before with technologies like CXL and Gen-Z.

Over time, these ecosystems tend to consolidate. UALink’s advantage is that it combines openness with high performance.

Ethernet will remain relevant, but it doesn’t deliver the same latency or efficiency for scale-up workloads.

So we believe UALink is well positioned as the open, high-performance alternative.  


Q: Will the industry converge on a single scale-up interconnect?

NextGenInfra.io: Is fragmentation a concern for system designers?

Kurtis Bowman: In the short term, yes. You’ll see multiple approaches coexisting.

But over time, customers will evaluate performance, cost, and total cost of ownership. That will drive convergence toward the most effective solutions.

We expect the picture to become much clearer within the next year as real deployments begin to emerge.


Q: What is the timeline for UALink deployment?

NextGenInfra.io: When will we see UALink in production systems?

Kurtis Bowman: The 1.0 specification was released in 2025, and silicon based on that is already in development.

The 2.0 features are rolling out now, with compliance programs coming later this year.

We expect full ecosystem availability—accelerators, switches, and IP—around 2027.  


Q: What signals strong ecosystem momentum for UALink?

NextGenInfra.io: What gives you confidence in adoption?

Kurtis Bowman: We now have over 100 members, including a broad range of industry leaders.

We’re seeing active development across accelerators, switches, and IP. Companies like Synopsys and Cadence are already working on IP, and multiple vendors are building silicon.

That level of engagement across the ecosystem is a strong indicator that UALink is gaining real traction.


Exit mobile version