100x less compute with GPT-level LLM performance: How a little known open source project could help solve the GPU power conundrum — RWKV looks promising but challenges remain

A representative abstraction of artificial intelligence
(Image credit: Shutterstock / vs148)

Recurrent Neural Networks (RNNs) are a type of Artificial Intelligence primarily used in the field of deep learning. Unlike traditional neural networks, RNNs have a memory that captures information about what has been calculated so far. In other words, they use their understanding from previous inputs to influence the output they will produce.

RNNs are called "recurrent" because they perform the same task for every element in a sequence, with the output being dependent on the previous computations. RNNs are still used to power smart technologies like Apple's Siri and Google Translate.

However, with the advent of transformers like ChatGPT, the landscape of natural language processing (NLP) has shifted. While transformers revolutionized NLP tasks, their memory and computational complexity scaled quadratically with sequence length, demanding more resources.


NVIDIA Tesla M40 24GB Module: $240 at Amazon

NVIDIA Tesla M40 24GB Module: $240 at Amazon

The NVIDIA Tesla M40 GPU Accelerator is the world's fastest accelerator for deep learning training. It provides accurate speech recognition, deep understanding in video and natural language content and better detection of anomalies in medical images.

Enter RWKV

Now, a new open source project, RWKV, is offering promising solutions to the GPU power conundrum. The project, backed by the Linux Foundation, aims to drastically reduce the compute requirement for GPT-level language learning models (LLMs), potentially by up to 100x.

RNNs exhibit linear scaling in memory and computational requirements, but struggle to match the performance of transformers due to their limitations in parallelization and scalability. This is where RWKV comes into play.

RWKV, or Receptance Weighted Key Value, is a novel model architecture that combines the parallelizable training efficiency of transformers with the efficient inference of RNNs. The result? A model that requires significantly fewer resources (VRAM, CPU, GPU, etc) for running and training, while maintaining high-quality performance. It also scales linearly to any context length and is generally better trained in languages other than English.

Despite these promising features, the RWKV model is not without its challenges. It is sensitive to prompt formatting and weaker at tasks requiring look-back. However, these issues are being addressed, and the model's potential benefits far outweigh the current limitations.

The implications of the RWKV project are profound. Instead of needing 100 GPUs to train a LLM model, a RWKV model could deliver similar results with fewer than 10 GPUs. This not only makes the technology more accessible but also opens up possibilities for further advancements.

More from TechRadar Pro

Wayne Williams
Editor

Wayne Williams is a freelancer writing news for TechRadar Pro. He has been writing about computers, technology, and the web for 30 years. In that time he wrote for most of the UK’s PC magazines, and launched, edited and published a number of them too.

Read more
Apple
Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter tech
Ai tech, businessman show virtual graphic Global Internet connect Chatgpt Chat with AI, Artificial Intelligence.
What is AI? Everything you need to know about Artificial Intelligence
DeepSeek logo
Why DeepSeek R1 could be right for your business, and why the hysteria around it is wrong
Ai tech, businessman show virtual graphic Global Internet connect Chatgpt Chat with AI, Artificial Intelligence.
What is RAG in AI? The low-down on Retrieval Augmented Generation
A hand reaching out to touch a futuristic rendering of an AI processor.
DeepSeek and the race to surpass human intelligence
A person holding out their hand with a digital AI symbol.
Your AI, your rules: Why BYO-LLM “bring your own LLM” is the future
Latest in Pro
Code Skull
Interpol operation arrests 300 suspects linked to African cybercrime rings
Insecure network with several red platforms connected through glowing data lines and a black hat hacker symbol
Multiple H3C Magic routers hit by critical severity remote command injection, with no fix in sight
ai quantization
Shadow AI: the hidden risk of operational chaos
An abstract image of a lock against a digital background, denoting cybersecurity.
Critical security flaw in Next.js could spell big trouble for JavaScript users
Digital clouds against a blue background.
Navigating the growing complexities of the cloud
Zendesk Relate 2025
Zendesk Relate 2025 - everything you need to know as the event unfolds
Latest in News
Netflix Ads
Netflix adds HDR10+ support – great news for Samsung TV owners, but don't expect LG and Sony to do the same any time soon
FiiO FX17 IEMs
Our favorite budget audiophile brand unveils wired earbuds with 26(!) drivers, electrostatic units, USB-C ultra-Hi-Res Audio, and a not-so-budget price
Nvidia RTX 5080 against a yellow TechRadar background
RTX 5080 24GB version teased by MSI - is it time to admit that 16GB isn't enough for 4K?
girl using laptop hoping for good luck with her fingers crossed
Windows 11 24H2 seems to be a massive fail – so Microsoft apparently working on 25H2 fills me with hope... and fear
Code Skull
Interpol operation arrests 300 suspects linked to African cybercrime rings
ChatGPT Advanced Voice mode on a smartphone.
Talking to ChatGPT just got better, and you don’t need to pay to access the new functionality