Google Cloud announced the general availability of its sixth-generation Tensor Processing Unit (TPU), Trillium, which powers the company’s advanced AI Hypercomputer architecture. Designed to tackle the demands of multimodal AI models such as Gemini 2.0, Trillium delivers significant advancements in performance, efficiency, and scalability. The AI Hypercomputer, built on over 100,000 Trillium chips within a Jupiter network fabric capable of 13 Petabits/sec bisectional bandwidth, enables large-scale distributed training for enterprise and startup customers.
Trillium TPUs provide over 4x improvement in training performance, up to 3x inference throughput, and a 67% boost in energy efficiency compared to the previous generation. Updates to Google Cloud’s AI Hypercomputer include enhancements to the XLA compiler and popular frameworks like JAX, PyTorch, and TensorFlow, optimizing price-performance across AI workloads. Features such as host-offloading with extensive DRAM, complementing High Bandwidth Memory (HBM), further improve operational efficiency. These innovations enable Trillium to scale AI training workloads efficiently and support large language models, embedding-intensive tasks, and inference scheduling.
AI21 Labs, a long-time TPU customer, is already leveraging Trillium to enhance its language models. Barak Lenz, CTO of AI21 Labs, stated, “The advancements in scale, speed, and cost-efficiency are significant. We believe Trillium will be essential in accelerating the development of our next generation of sophisticated language models.”
• 4.7x increase in peak compute performance per chip.
• Double the HBM capacity and interchip interconnect bandwidth.
• Supports up to 99% scaling efficiency for distributed training across 12 pods (3072 chips).
• Achieves up to 2.5x better training performance per dollar.
• Used to train Gemini 2.0 and other advanced AI models.