Broadcom has released its Scale Up Ethernet (SUE) Framework, a new architectural specification designed to optimize intra-rack and multi-rack communication for AI and HPC clusters. Built on Ethernet fundamentals and tailored for ultra-high performance accelerator networks, the SUE framework supports up to 9.6 Tbps of XPU-to-XPU bandwidth per pair and enables sub-2µs latency with highly efficient transport protocols, flow control, and packet reliability mechanisms. The specification, RM100, targets shared memory workloads and one-sided operations such as “put,” “get,” and atomics—scenarios common in AI model training and inference.
The SUE stack is built to scale efficiently across up to 1024 XPUs, supporting strict and unordered packet flows over 800G, 400G, and 200G Ethernet ports. By utilizing new constructs like AI Fabric Headers (AIFH), Credit-Based Flow Control (CBFC), and Link-Level Retry (LLR), the system delivers lossless transport, congestion control, and simple go-back-N recovery. The modular architecture supports mesh and switched topologies, offering high configurability per application. It also integrates with Ethernet-based fabrics using low-latency SerDes, packet encapsulation options, and optional FEC tuning.
- SUE supports 1, 2, or 4 Ethernet ports per instance, enabling up to 9.6 Tbps between XPUs
- Provides sub-2µs round-trip latency, suitable for AI inference and training workloads
- Supports standard Ethernet or compressed headers via AI Forwarding Header (AIFH)
- Enables lossless transport via Link-Level Retry (LLR), PFC, or CBFC
- Built on a shared memory model with one-sided operations (put/get/atomic)
- Designed for up to 1024 XPUs per cluster in switched or mesh configurations
- Each SUE instance includes command, management, and Ethernet interfaces
- Go-Back-N retransmit logic ensures packet reliability with minimal transport state
- Offers load balancing across multi-port SUE configurations based on real-time congestion
- Defines latency-optimized FEC options, including RS-272 for reduced overhead
With RM100, Broadcom positions Ethernet as a viable, efficient alternative to proprietary interconnects for AI scale-up clusters, aligning with the needs of next-generation GPU, TPU, and custom accelerator systems.
Broadcom is contributing the specification to the Open Compute Project.