Aurora breaks exascale barrier by linking 63,744 GPUs with Cray Slingshot Interconnects

The Aurora supercomputer at the U.S. Department of Energy’s Argonne National Laboratory has officially surpassed the exascale threshold, achieving over a quintillion calculations per second, as announced today at the ISC High Performance 2024 conference in Hamburg, Germany.

Built by Intel and Hewlett Packard Enterprise (HPE), Aurora features a groundbreaking architecture, inclusing 63,744 graphics processing units (GPUs), making it the world’s largest GPU-powered system with more interconnect endpoints than any other system to date.

Aurora Architecture Highlights

Processing Units

Intel CPUs: Aurora is equipped with next-generation Intel Xeon Scalable processors.
Intel GPUs: The system includes Intel’s Ponte Vecchio GPUs, which are designed for high-performance computing (HPC) and artificial intelligence (AI) workloads.

Performance

Exascale Performance: Aurora is expected to deliver performance exceeding one exaFLOP (10^18 floating-point operations per second). This places it among the first exascale systems in the world, capable of performing a quintillion calculations per second.

Memory

High-Bandwidth Memory: Aurora incorporates high-bandwidth memory (HBM) for both its CPUs and GPUs, which enhances data transfer rates and overall computational efficiency.
Unified Memory Architecture: The system uses a unified memory architecture that allows for seamless data sharing between CPUs and GPUs, reducing latency and improving performance.

Interconnect

Cray Slingshot: Aurora uses the Cray Slingshot high-speed interconnect, which offers advanced network capabilities, low latency, and high bandwidth. The Cray Slingshot interconnect is based on Ethernet technology, rather than Infiniband.
Per-Link Throughput: Each link in the Slingshot network provides up to 200 gigabits per second (Gbps) of bandwidth. This high per-link throughput ensures rapid data transfer rates, crucial for the vast data sets and intensive computations typical in HPC workloads.
Network Scalability: Slingshot’s architecture allows for scaling up to very large node counts, providing high aggregate bandwidth that can support thousands of nodes in an exascale system.
Adaptive Routing: Dynamic selection of optimal paths to avoid congestion and improve efficiency.
Quality of Service (QoS): Multiple QoS levels to prioritize critical traffic.
Scalability: Supports large-scale deployments with thousands of nodes, making it suitable for exascale systems.

Storage

Lustre File System: Aurora is expected to use the Lustre parallel file system, providing fast and scalable storage solutions that can handle the immense data throughput generated by exascale computing workloads.

The installation team, comprising staff from Argonne, Intel, and HPE, is focused on system validation, verification, and scaling up. They are addressing various hardware and software issues as the system approaches full-scale operations.

“Aurora is fundamentally transforming how we do science for our country,” Argonne Laboratory Director Paul Kearns said. “It will accelerate scientific discovery by combining high performance computing and AI to fight climate change, develop life-saving medical treatments, create new materials, understand the universe and so much more.”

https://www.anl.gov/article/argonnes-aurora-supercomputer-breaks-exascale-barrier

Source: Argonne National Lab

Aurora breaks exascale barrier by linking 63,744 GPUs with Cray Slingshot Interconnects

Harmonic unveils Pearl R-OLT module for fiber broadband

Intel appoints Kevin O’Buckley to head Foundry Services

Jim Carroll

Related Posts

DriveNets Expands AI Networking Portfolio with Broadcom Tomahawk 6 Systems

FS Launches 800G ZR/ZR+ for AI Scale-Across

Oklo Targets First Criticality for Texas Reactor

SiTime Completes Renesas Timing Business Acquisition

Myriota Launches Hybrid 5G Satellite and Cellular IoT Network

Belden Adds RUCKUS, Expanding Enterprise and Industrial Networking Portfolio

Adam Selipsky steps down at AWS

Categories

Archives

Aurora breaks exascale barrier by linking 63,744 GPUs with Cray Slingshot Interconnects

Aurora Architecture Highlights

Harmonic unveils Pearl R-OLT module for fiber broadband

Intel appoints Kevin O’Buckley to head Foundry Services

Related Posts

Categories

Archives

Tags