Astera Labs introduced its Scorpio X-Series 320-lane Smart Fabric Switch, positioning it as a high-radix, memory-semantic interconnect designed to support large-scale AI clusters with reduced latency and improved efficiency. The device is now shipping to hyperscalers and targets production AI deployments where multi-trillion parameter models and distributed agentic workloads are stressing traditional interconnect architectures.
The Scorpio X-Series integrates memory-semantic connectivity, allowing accelerators to access shared fabric resources using native load/store operations instead of software-managed messaging. This approach reduces protocol overhead and improves fabric efficiency at scale. The platform also incorporates hardware-accelerated Hypercast and in-network compute engines, which can double the performance of collective operations such as all-reduce and all-to-all, improving time-to-first-token and tokens-per-watt metrics. (see below)
Astera Labs expanded its broader Scorpio portfolio with the PCIe 6-based P-Series, spanning 32 to 320 lanes to support diverse accelerator configurations and system topologies. The COSMOS software stack unifies management across the platform, offering telemetry, diagnostics, and non-disruptive updates to maintain uptime in large AI clusters. The company said the scale-up switching silicon market could reach $20 billion by 2030, with Scorpio production ramping in the second half of 2026.
- Scorpio X-Series delivers 320 lanes, enabling high-radix, single-hop scale-up topologies
- Supports up to ~80 GPUs per switch with reduced hops versus legacy multi-switch designs (see page 9 diagram)
- Bandwidth per switch scales to ~20 Tbps vs ~9 Tbps for prior-generation designs
- Hypercast and in-network compute engines accelerate collective operations by up to 2x
- Memory-semantic fabric enables native load/store access across accelerators
- Scorpio P-Series expands PCIe fabric options from 32 to 320 lanes
- COSMOS software provides unified management, telemetry, and resiliency features
- Targets hyperscalers, AI labs, and neo-clouds building heterogeneous accelerator clusters
“The frontier models driving today’s most demanding AI applications require connectivity infrastructure that keeps pace with the accelerators powering them,” said Jitendra Mohan, CEO of Astera Labs.
🌐 Analysis: Hypercast is Astera Labs’ purpose-built multicast mechanism designed specifically for AI workloads, addressing a fundamental bottleneck in modern GPU clusters: the explosive growth of communication overhead driven by mixture-of-experts (MoE) models and large-scale collective operations. In MoE architectures, each token is dynamically routed to a subset of experts distributed across multiple GPUs, turning every routing decision into a multicast event. Traditional switching architectures struggle here because they either lack sufficient multicast group capacity or require slow, unpredictable configuration times—often ranging from hundreds of microseconds to milliseconds—introducing latency variability that directly impacts model performance and user experience.
🌐 Analysis: Hypercast addresses this by creating lightweight, pre-configurable multicast groups that can be instantiated quickly and at scale, enabling deterministic, low-latency data distribution across GPUs. This is critical not only for MoE routing but also for dense collective operations such as AllGather and all-to-all, which occur frequently during both training and inference. By accelerating these operations in hardware and eliminating redundant data replication or slow control-plane setup, Hypercast improves GPU utilization, reduces idle time, and increases tokens-per-watt efficiency. The net effect is a more predictable and efficient fabric, where communication no longer constrains model architecture or forces compromises in expert placement and routing decisions.
🌐 Analysis: Strategically, Hypercast is central to Astera Labs’ positioning in the AI interconnect market. It moves the company beyond traditional PCIe switching into the domain of intelligent, AI-aware fabrics with in-network compute capabilities.
