Startup claims to boost LLM performance using standard memory instead of GPU HBM — but experts remain unconvinced by the numbers despite promising CXL technology

AI Brain
(Image credit: Getty Images)

MemVerge, a provider of software designed to accelerate and optimize data-intensive applications, has partnered with Micron to boost the performance of LLMs using Compute Express Link (CXL) technology. 

The company's Memory Machine software uses CXL to reduce idle time in GPUs caused by memory loading.

The technology was demonstrated at Micron’s booth at Nvidia GTC 2024 and Charles Fan, CEO and Co-founder of MemVerge said, “Scaling LLM performance cost-effectively means keeping the GPUs fed with data. Our demo at GTC demonstrates that pools of tiered memory not only drive performance higher but also maximize the utilization of precious GPU resources.”

Impressive results

The demo utilized a high-throughput FlexGen generation engine and an OPT-66B large language model. This was performed on a Supermicro Petascale Server, equipped with an AMD Genoa CPU, Nvidia A10 GPU, Micron DDR5-4800 DIMMs, CZ120 CXL memory modules, and MemVerge Memory Machine X intelligent tiering software.

The demo contrasted the performance of a job running on an A10 GPU with 24GB of GDDR6 memory, and data fed from 8x 32GB Micron DRAM, against the same job running on the Supermicro server fitted with Micron CZ120 CXL 24GB memory expander and the MemVerge software.

The FlexGen benchmark, using tiered memory, completed tasks in under half the time of traditional NVMe storage methods. Additionally, GPU utilization jumped from 51.8% to 91.8%, reportedly as a result of MemVerge Memory Machine X software's transparent data tiering across GPU, CPU, and CXL memory.

Raj Narasimhan, senior vice president and general manager of Micron’s Compute and Networking Business Unit, said “Through our collaboration with MemVerge, Micron is able to demonstrate the substantial benefits of CXL memory modules to improve effective GPU throughput for AI applications resulting in faster time to insights for customers. Micron’s innovations across the memory portfolio provide compute with the necessary memory capacity and bandwidth to scale AI use cases from cloud to the edge.”

However, experts remain skeptical about the claims. Blocks and Files pointed out that the Nvidia A10 GPU uses GDDR6 memory, which is not HBM. A MemVerge spokesperson responded to this point, and others that the site raised, stating, “Our solution does have the same effect on the other GPUs with HBM. Between Flexgen’s memory offloading capabilities and Memory Machine X’s memory tiering capabilities, the solution is managing the entire memory hierarchy that includes GPU, CPU and CXL memory modules.”

MemVerge Memory Machine X results

(Image credit: MemVerge)

More from TechRadar Pro

Wayne Williams
Editor

Wayne Williams is a freelancer writing news for TechRadar Pro. He has been writing about computers, technology, and the web for 30 years. In that time he wrote for most of the UK’s PC magazines, and launched, edited and published a number of them too.

Read more
d-Matrix Corsair card
Tech startup proposes a novel way to tackle massive LLMs using the fastest memory available to mankind
Marvell cHBM benefits
A $100bn tech company you've probably never heard of is teaming up with the world's biggest memory manufacturers to produce supercharged HBM
Artificial Intelligence
Here's why 100TB+ SSDs will play a huge role in ultra large language models in the near future
Micron
Micron wants a bigger slice of the $100 billion HBM market with its 2026-bound HBM4 and HBM4E memory solutions
Micron 4600 SSD pictured against a black backdrop.
Like the Crucial T705 but more affordable? Micron 4600 PCIe Gen5 SSD comes painfully close to its award-winning sibling
A Corsair One i500 on a desk
Microsoft backed a tiny hardware startup that just launched its first AI processor that does inference without GPU or expensive HBM memory and a key Nvidia partner is collaborating with it
Latest in Pro
Collaboration in an office.
Trends driving IT decision-makers in 2025
Nvidia GR00T N1 humanoid robot
Nvidia is dreaming of trillion-dollar datacentres with millions of GPUs and I can't wait to live in the Omniverse
Nvidia Isaac GROOT N1
“The age of generalist robotics is here" - Nvidia's latest GROOT AI model just took us another step closer to fully humanoid robots
A computer file surrounded by red laser beams
Free online file converters could infect your PC with malware, FBI warns
Nvidia Earth-2 weather models
Nvidia has updated its virtual recreation of the entire planet - and it could mean better weather forecasts for everyone
Nvidia DGX Station
Nvidia’s DGX Station brings 800Gbps LAN, the most powerful chip ever launched in a desktop workstation PC
Latest in News
Volvo Gaussian Splatting
Volvo is using AI-generated worlds to make its cars safer and it’s all thanks to something called Gaussian splatting
Image of Asus ROG Ally running Bazzite/SteamOS
This SteamOS update promises a new future for non-Steam Deck handheld PCs – and I can’t wait
Perplexity Squid Game Ad
New ad declares Squid Game's real winner is Perplexity AI
Pedro Pascal in Apple's Someday ad promoting the AirPods 4 with Active Noise Cancellation.
Pedro Pascal cures his heartbreak thanks to AirPods 4 (and the power of dance) in this new ad
Frank Grimes confronts Homer Simpson in The Simpsons' Homer's Enemy episode
Disney+ adds a new continuous Simpsons stream, so you no longer have to spend ages choosing an episode
Helly and Mark standing on an artificial hill surrounded by goats in Severance season 2 episode 3
New Apple teaser for Severance season 2 finale suggests we might finally find out what Lumon is doing with those goats, and I don't think it's anything good