Elon Musk’s artificial intelligence startup, xAI, has announced plans to significantly expand its Colossus supercomputer located in Memphis, Tennessee. The expansion aims to increase the system’s capacity from its current 100,000 Nvidia H100 GPUs to over one million GPUs, incorporating both Nvidia H100 and H200 models. This development is expected to position Memphis as a leading hub for AI computing technology.
The project involves substantial infrastructure enhancements, including upgrades to the local power supply. The Tennessee Valley Authority (TVA) and Memphis Light, Gas and Water (MLGW) are collaborating to increase capacity to 150 megawatts by constructing new substations. This expansion is anticipated to create numerous high-tech job opportunities in the region. However, local environmental groups have raised concerns about potential impacts on air quality and water resources, calling for greater transparency and community involvement in the project’s development.
Key Points:
• Supercomputer Expansion: Colossus will scale from 100,000 to over one million GPUs, integrating Nvidia H100 and H200 models.
• Infrastructure Upgrades: TVA and MLGW plan to boost power capacity to 150 megawatts through new substations.
• Economic Impact: The expansion is expected to generate significant high-tech employment opportunities in Memphis.
• Environmental Concerns: Community groups are urging for transparency regarding potential effects on air quality and water resources.
• Project Timeline: The expansion is slated for completion by 2026, aiming to establish Memphis as a global AI computing center.
This initiative underscores xAI’s commitment to advancing AI capabilities while highlighting the importance of balancing technological progress with environmental and community considerations.
In October, NVIDIA confirmed that xAI’s Colossus supercomputer, housing 100,000 NVIDIA Hopper GPUs in Memphis, Tennessee, reached this scale with the NVIDIA Spectrum-X Ethernet networking platform. Designed to handle large-scale AI processing, Spectrum-X supports standards-based Ethernet, delivering high efficiency for remote direct memory access (RDMA) across AI data centers. The Colossus AI cluster, the largest of its kind, trains xAI’s Grok language models, which serve as the foundation for chatbot features available to X Premium subscribers.
- 100,000 NVIDIA Hopper GPUs power Colossus, expanding to 200,000
- Spectrum-X Ethernet platform achieves 95% data throughput with RDMA
- 122-day build, reaching training-ready status in 19 days
- Uses adaptive routing, congestion control, and enhanced performance isolation