Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter tech

Apple
(Image credit: gizmochina)

  • ReDrafter delivers 2.7x more tokens per second compared to traditional auto-regression
  • ReDrafter could reduce latency for users while using fewer GPUs
  • Apple hasn't said when ReDrafter will be deployed on rival AI GPUs from AMD and Intel

Apple has announced a collaboration with Nvidia to accelerate large language model inference using its open source technology, Recurrent Drafter (or ReDrafter for short).

The partnership aims to address the computational challenges of auto-regressive token generation, which is crucial for improving efficiency and reducing latency in real-time LLM applications.

ReDrafter, introduced by Apple in November 2024, takes a speculative decoding approach by combining a recurrent neural network (RNN) draft model with beam search and dynamic tree attention. Apple’s benchmarks show that this method generates 2.7x more tokens per second compared to traditional auto-regression.

Could it extend beyond Nvidia?

Through its integration into Nvidia’s TensorRT-LLM framework, ReDrafter extends its impact by enabling faster LLM inference on Nvidia GPUs widely used in production environments.

To accommodate ReDrafter’s algorithms, Nvidia introduced new operators and tweaked existing ones within TensorRT-LLM, making the tech available for any developers looking to optimize performance for large-scale models.

In addition to the speed improvements, Apple says ReDrafter has the potential to reduce user latency while requiring fewer GPUs. This efficiency not only lowers computational costs but also lessens power consumption, a vital factor for organizations managing large-scale AI deployments.

While the focus of this collaboration remains on Nvidia’s infrastructure for now, it’s possible that similar performance benefits could be extended to rival GPUs from AMD or Intel at some point in the future.

Breakthroughs like this can help improve machine learning efficiency. As Nvidia says, "This collaboration has made TensorRT-LLM more powerful and more flexible, enabling the LLM community to innovate more sophisticated models and easily deploy them with TensorRT-LLM to achieve unparalleled performance on Nvidia GPUs. These new features open exciting possibilities, and we eagerly anticipate the next generation of advanced models from the community that leverage TensorRT-LLM capabilities, driving further improvements in LLM workloads.”

You can read more about the collaboration with Apple on the Nvidia Developer Technical Blog.

You might also like

TOPICS
Wayne Williams
Editor

Wayne Williams is a freelancer writing news for TechRadar Pro. He has been writing about computers, technology, and the web for 30 years. In that time he wrote for most of the UK’s PC magazines, and launched, edited and published a number of them too.

Read more
Ai tech, businessman show virtual graphic Global Internet connect Chatgpt Chat with AI, Artificial Intelligence.
Apple is the biggest winner of DeepSeek’s new AI breakthrough
SambaNova runs DeepSeek
Nvidia rival claims DeepSeek world record as it delivers industry-first performance with 95% fewer chips
Nvidia HQ
Nvidia calls DeepSeek an 'excellent AI advancement' and praises the Chinese AI app's ingenuity
Apple
Nvidia vs Apple and the world: Apple may have just confirmed its ACDC superchip will use UALink tech
broadcom
Apple's new BFF, Broadcom, reveals three hyperscalers want to deploy 1,000,000 GPUs or XPUs by 2027; something that will make Nvidia wince
Nvidia H800 GPU
A look at the unbelievable Nvidia GPU that powers DeepSeek's AI global ambition
Latest in Pro
Woman shocked by online scam, holding her credit card outside
Cybercriminals used vendor backdoor to steal almost $600,000 of Taylor Swift tickets
Customer service 3D manager concept. AI assistance headphone call center
The era of Agentic AI
Woman using iMessage on iPhone
UK government guidelines remove encryption advice following Apple backdoor spat
Cryptocurrencies
Ransomware’s favorite Russian crypto exchange seized by law enforcement
A hand reaching out to touch a futuristic rendering of an AI processor.
Balancing innovation and security in an era of intensifying global competition
Wordpress brand logo on computer screen. Man typing on the keyboard.
Thousands of WordPress sites targeted with malicious plugin backdoor attacks
Latest in News
Apple iPhone 16 Pro Max REVIEW
New iPhone 17 Air leak may have revealed some key specs – and how it compares to the iPhone 17 Pro Max
Apple iPhone 16 Review
Three iPhone 17 model dummy units appear in a hands-on video leak
The Samsung Galaxy S25 Edge on display the January 22, 2025 Galaxy Unpacked event.
New Samsung Galaxy S25 Edge may have revealed some key details – including its price
Quordle on a smartphone held in a hand
Quordle hints and answers for Sunday, March 9 (game #1140)
NYT Strands homescreen on a mobile phone screen, on a light blue background
NYT Strands hints and answers for Sunday, March 9 (game #371)
NYT Connections homescreen on a phone, on a purple background
NYT Connections hints and answers for Sunday, March 9 (game #637)