Hey Presto! Nvidia pulls software hack out of AI hat and doubles performance of H100 GPU for free

Nvidia H100 Tensor Core GPUs
(Image credit: Nvidia)

Nvidia is banding together with a list of tech partners on a game-changing piece of software that’s set to double the performance of its flagship H100 Tensor Core GPUs. 

The open source TensorRT-LLM update, which is set for release in the coming weeks, sees an up-to-date system outperform the A100 by eightfold, whereas H100s would previously outperform the A100 by just fourfold. This was tested on the GPT-J 6B, a model that’s used to summarise articles from CNN and Daily Mail.

When tested on Meta’s Llama2 LLM, TensorRT-LLM-powered H100s outperformed A100s by 4.6 times – versus 2.6 times before the update.

Nvidia H100s faster than ever

The versatility and dynamism of large language models (LLMs) can make it difficult to batch requests and execute them in parallel, which means some requests finish much earlier than others.

To solve this, Nvidia and its partners embedded TensorRT-LLM with a more powerful scheduling technique called in-flight batching. This takes advantage of the fact text generation can be broken down into multiple subtasks. 

Put simply, instead of waiting for an entire batch of tasks from one request to finish before moving on to the next request, the system can continue processing new batches from different requests in parallel. 

TensorRT-LLM comprises a TensorRT deep learning compiler and includes optimized kernels, pre-processing and post-processing steps, as well as multi-GPU and multi-node communication primitives. 

The result? Groundbreaking performance on Nvidia’s GPUs paving the way for new large language model experimentation, quick customization, and peak performance. 

This software uses tensor parallelism, in which individual weight matrices are split across devices, in turn, allowing efficient inference at scale; each model runs in parallel across multiple GPUs and across multiple servers.

TensorRT-LLM also includes fully optimized and read-to-run versions of popular LLMs including Llama 2, GPT-2 and GPT-3, as well as Falcon, Mosaic MPT, BLOOM, and dozens of others. These can be accessed through a Python API.

The update is available in early access, and will soon be integrated into the Nvidia NeMo framework, which is part of Nvidia AI Enterprise. Researchers can access this through the NeMo framework, the NGC portal, or through the source repository on GitHub.

More from TechRadar Pro

TOPICS
Keumars Afifi-Sabet
Channel Editor (Technology), Live Science

Keumars Afifi-Sabet is the Technology Editor for Live Science. He has written for a variety of publications including ITPro, The Week Digital and ComputerActive. He has worked as a technology journalist for more than five years, having previously held the role of features editor with ITPro. In his previous role, he oversaw the commissioning and publishing of long form in areas including AI, cyber security, cloud computing and digital transformation.

Read more
Apple
Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter tech
Nvidia Jetson Orin Nano
Good news! Popular Nvidia hardware gets free upgrade that boosts its performance by up to 70% - but it's not for gamers
Nvidia H800 GPU
A look at the unbelievable Nvidia GPU that powers DeepSeek's AI global ambition
SambaNova runs DeepSeek
Nvidia rival claims DeepSeek world record as it delivers industry-first performance with 95% fewer chips
Nvidia HQ
Nvidia calls DeepSeek an 'excellent AI advancement' and praises the Chinese AI app's ingenuity
Nvidia Blackwell Ultra
Nvidia GTC 2025: New Blackwell Ultra GPU series is the most powerful AI hardware yet
Latest in Pro
Representational image of a shrouded hacker.
Adapting the UK’s cyber ecosystem
Isometric demonstrating multi-factor authentication using a mobile device.
NCSC gets influencers to sing the praises of 2FA
Sam Altman and OpenAI
OpenAI is upping its bug bounty rewards as security worries rise
Context Windows
Why are AI context windows important?
BERT
What is BERT, and why should we care?
A person holding out their hand with a digital AI symbol.
AI is booming — but are businesses seeing real impact?
Latest in News
Nintendo Switch 2 Joy-Con up-close from app store
Nintendo's new app gave us another look at the Switch 2, and there's something different with the Joy-Con
cheap Nintendo Switch game deals sales
Nintendo didn't anticipate that Mario Kart 8 Deluxe was 'going to be the juggernaut' for the Nintendo Switch when it was ported to the console, according to former employees
Three angles of the Apple MacBook Air 15-inch M4 laptop above a desk
Apple MacBook Air 15-inch (M4) review roundup – should you buy Apple's new lightweight laptop?
Witchbrook
Witchbrook, the life-sim I've been waiting years for, finally has a release window and it's sooner than you think
Amazon Echo Smart Speaker
Amazon is experimenting with renaming Echo speakers to Alexa speakers, and it's about time
Shigeru Miyamoto presents Nintendo Today app
Nintendo Today smartphone app is out now on iOS and Android devices – and here's what it does