Most formidable supercomputer ever is warming up for ChatGPT 5 — thousands of 'old' AMD GPU accelerators crunched 1-trillion parameter models

Oak Ridge National Laboratory
(Image credit: Oak Ridge National Laboratory)

The most powerful supercomputer in the world has used just over 8% of the GPUs it's fitted with to train a large language model (LLM) containing one trillion parameters – comparable to OpenAI's GPT-4.

Frontier, based in the Oak Ridge National Laboratory, used 3,072 of its AMD Radeon Instinct GPUs to train an AI system at the trillion-parameter scale, and it used 1,024 of these GPUs (roughly 2.5%) to train a 175-billion parameter model, essentially the same size as ChatGPT.

The researchers needed 14TB RAM minimum to achieve these results, according to their paper, but each MI250X GPU only had 64GB VRAM, meaning the researchers had to group up several GPUs together. This introduced another challenge in the form of parallelism, however, meaning the components had to communicate much better and more effectively as the overall size of the resources used to train the LLM increased.

Putting the world's most powerful supercomputer to work

LLMs aren't typically trained on supercomputers, rather they're trained in specialized servers and require many more GPUs. ChatGPT, for example, was trained on more than 20,000 GPUs, according to TrendForce. But the researchers wanted to show whether they could train a supercomputer much quicker and more effectively way by harnessing various techniques made possible by the supercomputer architecture. 

The scientists used a combination of tensor parallelism – groups of GPUs sharing the parts of the same tensor – as well as pipeline parallelism – groups of GPUs hosting neighboring components. They also employed data parallelism to consume a large number of tokens simultaneously and a larger amount of computing resources. The overall effect was to achieve a much faster time.  

For the 22-billion parameter model, they achieved peak throughput of 38.38% (73.5 TFLOPS), 36.14% (69.2 TFLOPS) for the 175-billion parameter model, and 31.96% peak throughput (61.2 TFLOPS) for the 1-trillion parameter model. 

They also achieved 100% weak scaling efficiency%, as well as an 89.93% strong scaling performance for the 175-billion model, and an 87.05% strong scaling performance for the 1-trillion parameter model. 

Although the researchers were open about the computing resources used, and the techniques involved, they neglected to mention the timescales involved in training an LLM in this way. 

TechRadar Pro asked the researchers for timings, but they have not responded at the time of writing.

More from TechRadar Pro

TOPICS
Keumars Afifi-Sabet
Channel Editor (Technology), Live Science

Keumars Afifi-Sabet is the Technology Editor for Live Science. He has written for a variety of publications including ITPro, The Week Digital and ComputerActive. He has worked as a technology journalist for more than five years, having previously held the role of features editor with ITPro. In his previous role, he oversaw the commissioning and publishing of long form in areas including AI, cyber security, cloud computing and digital transformation.

Read more
Nvidia H800 GPU
A look at the unbelievable Nvidia GPU that powers DeepSeek's AI global ambition
Cerebras WSE-3
DeepSeek on steroids: Cerebras embraces controversial Chinese ChatGPT rival and promises 57x faster inference speeds
A person's hand using DeepSeek on their mobile phone
'A virtual DPU within a GPU': Could clever hardware hack be behind DeepSeek's groundbreaking AI efficiency?
The Ryzen AI Max+ 395 could power the latest generation of powerful mini PCs
The AMD Ryzen AI Max+ 395 dominates as the "most powerful" APU on the market, but its competition is questionable
SambaNova runs DeepSeek
Nvidia rival claims DeepSeek world record as it delivers industry-first performance with 95% fewer chips
Project DIGITS - front view
I am thrilled by Nvidia’s cute petaflop mini PC wonder, and it’s time for Jensen’s law: it takes 100 months to get equal AI performance for 1/25th of the cost
Latest in Pro
cybersecurity
What's the right type of web hosting for me?
Security padlock and circuit board to protect data
Trust in digital services around the world sees a massive drop as security worries continue
Hacker silhouette working on a laptop with North Korean flag on the background
North Korea unveils new military unit targeting AI attacks
An image of network security icons for a network encircling a digital blue earth.
US government warns agencies to make sure their backups are safe from NAKIVO security issue
Laptop computer displaying logo of WordPress, a free and open-source content management system (CMS)
This top WordPress plugin could be hiding a worrying security flaw, so be on your guard
construction
Building in the digital age: why construction’s future depends on scaling jobsite intelligence
Latest in News
Quordle on a smartphone held in a hand
Quordle hints and answers for Sunday, March 23 (game #1154)
NYT Strands homescreen on a mobile phone screen, on a light blue background
NYT Strands hints and answers for Sunday, March 23 (game #385)
NYT Connections homescreen on a phone, on a purple background
NYT Connections hints and answers for Sunday, March 23 (game #651)
Google Pixel 9 Pro Fold main display opened
Apple is rumored to be prioritizing battery life on the foldable iPhone – which could also feature a liquid metal hinge for added durability
Google Pixel 9
The Google Pixel 10 just showed up in Android code – and may come with a useful speed boost
L-mount alliance
Sirui joins L-Mount Alliance to deliver its superb budget lenses for Leica, DJI, Sigma and Panasonic cameras