Skip to content

Elon Musk announces xAI's aim to secure 50 million AI GPUs with H100 equivalent capabilities within five years, with 230,000 GPUs reportedly operational already, including 30,000 GB200s, for training the AI model Grok.

Elon Musk's xAI project targets 50 exaFLOPS of AI training compute, equivalent to 50 million H100 GPUs, within five years. This feat is feasible due to Nvidia's rapid growth in performance, though it might necessitate considerable power resources, even with fewer than a million GPUs.

AI company xAI, led by Elon Musk, aims to acquire 50 million AI GPUs, equivalent to the H100 model,...
AI company xAI, led by Elon Musk, aims to acquire 50 million AI GPUs, equivalent to the H100 model, within the next five years. This ambitious plan includes the reported current operation of 230,000 GPUs, with 30,000 of them being the advanced GB200 models, for the purpose of training Grok.

Elon Musk announces xAI's aim to secure 50 million AI GPUs with H100 equivalent capabilities within five years, with 230,000 GPUs reportedly operational already, including 30,000 GB200s, for training the AI model Grok.

In the realm of artificial intelligence (AI), Elon Musk's xAI is making ambitious strides. The goal is to create an AI supercluster capable of 50 ExaFLOPS for AI training within the next five years. However, this ambitious target presents a significant energy challenge.

The proposed supercluster would consist of approximately 50 million Nvidia H100-equivalent AI accelerators. Each H100 GPU, under load, consumes about 700 watts (W). This means that the raw power consumption of such a supercluster would be an astounding 35 gigawatts (GW), equivalent to the output of 35 nuclear power plants.

However, this figure is considered unrealistic for today's technology and infrastructure. More efficient future architectures, like the Rubin Ultra and Feynman GPU designs, may double performance per watt or more. With such improvements, the power requirement could be reduced to about 4.6 GW, still a substantial amount, enough to power millions of homes or small countries.

For comparison, xAI's current Colossus 2 data center, with around 1 million AI accelerators, consumes about 1.4 to 1.96 GW. The Colossus 2 cluster will be made up of 550,000 GB200 and GB300 nodes, each with two GPUs.

The first nodes of the Colossus 2 cluster are expected to come online in the coming weeks. Assuming that Nvidia can achieve the aforementioned performance increases with its four subsequent generations of AI accelerators based on the Rubin and Feynman architectures, around 650,000 Feynman Ultra GPUs will be needed to reach 50 ExaFLOPS sometime in 2029.

It's worth noting that the large total power figure only accounts for AI training compute and does not account for additional infrastructure like cooling, networking, redundancy, or inference workloads, which would further increase total energy demand.

In terms of performance, one H100 AI accelerator can deliver around 1,000 FP16/BF16 TFLOPS for AI training. Nvidia's Blackwell B200 delivers 20,000 times higher inference performance than the 2016 Pascal P100, offering around 20,000 FP4 TFLOPS versus the P100's 19 FP16 TFLOPS. The approach uses a single production node (e.g., Blackwell -> Blackwell Ultra, Rubin -> Rubin Ultra) rather than switching to a new process technology for a known architecture.

The energy demands of such a supercluster are indeed a formidable challenge. As xAI moves forward with its plans to deploy 50 million H100 GPU equivalents for AI use over the next five years, the focus on developing more efficient hardware and infrastructure will be crucial.

The upcoming supercluster, consisting of 50 million H100-equivalent AI accelerators, requires substantial energy for its operation, equivalent to the output of 35 nuclear power plants in its current state. To meet the 50 ExaFLOPS target within five years, focusing on developing more efficient hardware and infrastructure, like the Rumin Ultra and Feynman GPU designs, will be paramount for reducing energy demands.

In the process of investing in artificial-intelligence-driven technology, the finance sector should pay particular attention to the energy efficiency of these advanced AI systems, given the substantial energy consumption involved in the industry.

Read also:

    Latest