xAI’s Colossus supercomputer cluster uses 100,000 Nvidia Hopper GPUs — and it was all made possible using Nvidia’s Spectrum-X Ethernet networking platform

Nvidia H100
(Image credit: Nvidia)

  • Nvidia and xAI collaborate on Colossus development
  • xAI has markedly cut down 'flow collisions' during AI model training
  • Spectrum-X has been crucial in training the Grok AI model family

Nvidia has shed light on how xAI’s ‘Colossus’ supercomputer cluster can keep a handle on 100,000 Hopper GPUs - and it’s all down to using the chipmaker's Spectrum-X Ethernet networking platform.

Spectrum-X, the company revealed, is designed to provide massive performance capabilities to multi-tenant, hyperscale AI factories using its Remote Directory Memory Access (RDMA) network.

The platform has been deployed at Colossus, the world’s largest AI supercomputer, since its inception. The Elon Musk-owned firm has been using the cluster to train its Grok series of large language models (LLMs), which power the chatbots offered to X users.

The facility was built in collaboration with Nvidia in just 122 days, and xAI is currently in the process of expanding it, with plans to deploy a total of 200,000 Nvidia Hopper GPUs.

Training Grok takes serious firepower

The Grok AI models are extremely large, with Grok-1 measuring in as 314 billion parameters and Grok-2 outperforming Claude 3.5 Sonnet and GPT-4 Turbo at the time of launch in August.

Naturally, training these models requires significant network performance. Using Nvidia’s Spectrum-X platform, xAI recorded zero application legacy degradation or packet loss as a result of ‘flow collisions’, or bottlenecks within AI networking paths.

xAI revealed it has been able to maintain 95% data throughput enabled by Spectrum-X’s congestion control capabilities. The company added this level of performance cannot be delivered at this scale via standard Ethernet.

Using traditional Ethernet, this typically creates thousands of flow collisions while delivering only 60% data throughput, according to Nvidia.

A spokesperson for xAI said the combination of Hopper GPUs and Spectrum-X has allowed the company to “push the boundaries of training AI models” and created a “super-accelerated and optimized AI factory”

“AI is becoming mission-critical and requires increased performance, security, scalability and cost-efficiency,” said Gilad Shainer, senior vice president of networking at Nvidia.

“The NvidiaSpectrum-X Ethernet networking platform is designed to provide innovators such as xAI with faster processing, analysis and execution of AI workloads, and in turn accelerates the development, deployment and time to market of AI solutions.”

Part of the Spectrum-X platform includes the Spectrum SN5600 Ethernet switch - this supports port speeds of up to 800Gb/s and is based on the Spectrum-4 switch ASIC, according to Nvidia.

xAI opted to combine the Spectrum-X SN5600 switch with NVIDIA BlueField-3 SuperNICs for higher performance.

You might also like

TOPICS
News and Analysis Editor, ITPro

Ross Kelly is News & Analysis Editor at ITPro, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape.

Read more
Nvidia H800 GPU
A look at the unbelievable Nvidia GPU that powers DeepSeek's AI global ambition
HPE
HPE may have beaten Supermicro and Dell to win a $1bn AI contract, but it's not for the Colossus supercomputer
SambaNova runs DeepSeek
Nvidia rival claims DeepSeek world record as it delivers industry-first performance with 95% fewer chips
Cerebras WSE-3
DeepSeek on steroids: Cerebras embraces controversial Chinese ChatGPT rival and promises 57x faster inference speeds
Nvidia Quantum-X and Spectrum-X Silicon Photonics
Nvidia is planning post-copper 1.6Tbps network tech to connect millions of GPUs as it unveils photonics networking gear at GTC 2025
Nvidia GR00T N1 humanoid robot
Nvidia is dreaming of trillion-dollar datacentres with millions of GPUs and I can't wait to live in the Omniverse
Latest in Pro
Epson EcoTank ET-4850 next to a TechRadar badge that reads Big Savings
I found the best printer deal you won't see in the Amazon Spring Sale and it's got a massive $150 saving
Microsoft Copiot Studio deep reasoning and agent flows
Microsoft reveals OpenAI-powered Copilot AI agents to bosot your work research and data analysis
Group of people meeting
Inflexible work policies are pushing tech workers to quit
Data leak
Top home hardware firm data leak could see millions of customers affected
Representational image depecting cybersecurity protection
Third-party security issues could be the biggest threat facing your business
An image of network security icons for a network encircling a digital blue earth.
Why multi-CDNs are going to shake up 2025
Latest in News
Hisense U8 series TV on wall in living room
Hisense announces 2025 mini-LED TV lineup, with screen sizes up to 100 inches – and a surprising smart TV switch
Nintendo Music teaser art
Nintendo Music expands its library with songs from Kirby and the Forgotten Land and Tetris
An image of Pro-Ject's Flatten it closed and opened
Pro-Ject’s new vinyl flattener will fix any warped LPs you inadvertently buy on Record Store Day
The iPhone 16 Pro on a grey background
iPhone 17 Pro tipped to get 8K video recording – but I want these 3 video features instead
EA Sports F1 25 promotional image featuring drivers Oscar Piastri, Carlos Sainz and Oliver Bearman.
F1 25 has been officially announced, with this year's entry marking a return for Braking Point and a 'significant overhaul' for My Team mode
Garmin clippd integration
Garmin's golf watches just got a big software integration upgrade to help you improve your game