Google unveiled its 5th generation Tensor Processing Unit (TPU v5e), promising up to 2x higher training performance per dollar and up to 2.5x inference performance per dollar for LLMs and gen AI models compared to Cloud TPU v4.
TPU v5e pods can be interconnected with up to 256 chips, providing aggregate bandwidth of more than 400 Tb/s and 100 petaOps of INT8 performance.
TPU v5e also supports eight different VM configurations, ranging from one chip to more than 250 chips within a single slice. This allows customers to choose the right configurations to serve a wide range of LLM and gen AI model sizes.
Google Cloud also introduced Multislice technology, which allows users to scale AI models beyond the boundaries of physical TPU pods. This means that users can now scale their training jobs up to tens of thousands of Cloud TPU v5e or TPU v4 chips. Previously, training jobs using TPUs were limited to a single slice of TPU chips, capping the size of the largest jobs at a maximum slice size of 3,072 chips for TPU v4.