Arista Networks has introduced new AI networking capabilities designed to improve performance and efficiency in large-scale AI workloads. The latest advancements in the EOS Smart AI Suite include Cluster Load Balancing (CLB), an AI-optimized Ethernet-based load balancing solution, and CloudVision Universal Network Observability (CV UNO), which provides AI job-centric visibility and troubleshooting. These innovations aim to reduce network latency, optimize bandwidth utilization, and ensure seamless AI workload execution at scale.
Arista’s Cluster Load Balancing (CLB) leverages RDMA-aware flow placement to evenly distribute AI traffic and prevent bottlenecks in Ethernet-based AI clusters. Unlike traditional load balancing methods that can create uneven traffic distribution, CLB dynamically optimizes traffic across both leaf-to-spine and spine-to-leaf network paths. This approach enhances efficiency in machine learning clusters, a key requirement as AI workloads grow in complexity. Meanwhile, CV UNO integrates network, system, and AI job data to provide real-time visibility into AI job performance, reducing troubleshooting time and improving reliability.
• Optimized AI traffic flow: CLB ensures low-latency and balanced AI workload distribution across AI clusters.
• Real-time AI job monitoring: Tracks congestion, packet drops, and link utilization for precise performance insights.
• Deep-dive AI analytics: Identifies bottlenecks by analyzing network devices, server NICs, and RDMA errors.
• Advanced flow visualization: Maps AI job flows at microsecond granularity for faster issue resolution.
• Proactive AI performance management: Correlates network and compute performance to prevent workload disruptions.
“As Oracle continues to grow its AI infrastructure leveraging Arista switches, we see a need for advanced load balancing techniques to help avoid flow contentions and increase throughput in ML networks,” said Jag Brar, Vice President and Distinguished Engineer at Oracle Cloud Infrastructure.