Meta Deploys Tens of Millions of AWS Graviton5 Cores

Jim Carroll

2 months ago

The Rise of CPU Power in Agentic AI

Meta will expand its AI infrastructure footprint with a large-scale deployment of Amazon Web Services (AWS) Graviton5 processors, targeting the growing class of agentic AI workloads that demand high-performance CPU resources. The agreement positions Meta Platforms as one of the largest customers for Graviton-based infrastructure, with an initial rollout spanning tens of millions of cores and the flexibility to scale further.

The deployment underscores a shift in AI infrastructure design. While GPUs remain central to model training, Meta is scaling CPU capacity to support real-time inference, orchestration, and multi-step reasoning tasks associated with agentic AI systems. These workloads—ranging from code generation to search and task coordination—require massive parallel processing and low-latency communication across distributed compute environments. AWS Graviton5, built on a 3nm process and featuring 192 cores with significantly expanded cache, targets these requirements by improving inter-core communication and overall throughput.

The infrastructure stack integrates tightly with AWS services, including the Nitro System for hardware-level virtualization, Elastic Fabric Adapter (EFA) for low-latency networking, and support for bare-metal access. Meta also continues to leverage AWS’s broader AI platform, including Amazon Bedrock, as part of its evolving AI architecture. The result is a hybrid compute model that combines GPUs for training with purpose-built CPUs for large-scale inference and orchestration across billions of user interactions.

Deployment begins with tens of millions of Graviton cores, with expansion planned
Meta Platforms becomes one of the largest global users of AWS Graviton infrastructure
Focus on CPU-intensive agentic AI workloads: reasoning, orchestration, search, and code generation
Graviton5 features 192 cores, 5x larger cache, and up to 25% performance improvement vs. prior generation
Built on 3nm process technology for improved power efficiency
Integration with AWS Nitro System enables secure, high-performance virtualization and bare-metal access
Elastic Fabric Adapter (EFA) supports low-latency, high-bandwidth interconnects for distributed AI workloads
Continued use of AWS AI services, including Amazon Bedrock, to support large-scale deployment

“This isn’t just about chips; it’s about giving customers the infrastructure foundation, as well as data and inference services, to build AI that understands, anticipates, and scales efficiently to billions of people worldwide,” said Nafea Bshara, vice president and distinguished engineer, Amazon.

🌐 Analysis: The rise of AWS Graviton reflects a decade-long strategy by Amazon Web Services to vertically integrate its infrastructure stack, beginning with silicon design and extending into full data center architecture. The trajectory shows how custom Arm-based CPUs evolved from a cost experiment into a cornerstone of hyperscale AI infrastructure.

AWS Graviton Development Timeline

2015 – Acquisition of Annapurna Labs
AWS acquires Annapurna Labs, an Israeli chip design firm, laying the foundation for in-house silicon development. This move mirrors later strategies by hyperscalers to control performance, cost, and power efficiency at the hardware level.
2018 – Graviton1 (Arm Cortex-A72)
First-generation Graviton launches as a low-cost alternative for scale-out cloud workloads. It validates Arm viability in the data center but remains limited in performance and adoption.
2019 – Graviton2 (Arm Neoverse N1)
Major architectural leap with 64 cores and significantly improved performance. Gains broad adoption across general-purpose workloads (M6g, C6g), establishing Arm as a credible x86 alternative in cloud environments.
2021 – Graviton3
Targets high-performance computing and machine learning. Introduces DDR5 memory and bfloat16 support, enabling better efficiency for AI inference and vector workloads.
2023 – Graviton4 (Neoverse V2, 96 cores)
Expands core count and memory bandwidth (~75% increase vs. G3). Adds stronger security features, including full memory encryption. Positions Graviton for databases, analytics, and large-scale enterprise workloads.
2025–2026 – Graviton5 (192 cores, 3nm)
Doubles core count and focuses on agentic AI workloads such as orchestration, reasoning, and real-time inference. Enhanced cache and interconnect efficiency target distributed, latency-sensitive AI systems at hyperscale.
2026 – Hyperscale Adoption Milestone (Meta Deployment)
Meta Platforms commits to deploying tens of millions of Graviton cores, signaling that Graviton has reached Tier-1 infrastructure status for AI alongside GPUs.

AWS Graviton Processor Comparison: Graviton4 vs. Graviton5
Feature	Graviton4	Graviton5
Launch timeframe	2023	2025–2026 (current generation)
CPU cores	96 cores	192 cores
Architecture	Arm Neoverse V2	Custom Arm-based (AWS-designed)
Process node	4nm (TSMC)	3nm (TSMC)
Performance uplift	~30% vs. Graviton3 (AWS-reported)	~25% vs. Graviton4 (AWS-reported)
Cache	Large shared L3 cache (expanded vs. G3)	Significantly larger aggregate cache (~5x vs. prior gen, AWS claim)
Memory	DDR5 with increased bandwidth	Enhanced DDR5, higher bandwidth and efficiency
AI / ML support	Optimized for ML inference, vector workloads	Optimized for agentic AI orchestration, real-time reasoning workloads
Primary workloads	Databases, analytics, HPC, scale-out services	Agentic AI, large-scale inference, distributed orchestration, real-time services
Design & manufacturing	AWS (Annapurna Labs) / TSMC	AWS (Annapurna Labs) / TSMC