'It is obscenely fast' — Biggest rival to Nvidia demos million-core super AI inference chip that obliterates the DGX100 with 44GB of super fast memory and you can even try it for free

Cerebras AI inference
(Image credit: Cerebras)

Cerebras has unveiled its latest AI inference chip, which is being touted as a formidable rival to Nvidia’s DGX100.

The chip features 44GB of high-speed memory, allowing it to handle AI models with billions to trillions of parameters.

For models that surpass the memory capacity of a single wafer, Cerebras can split them at layer boundaries, distributing them across multiple CS-3 systems. A single CS-3 system can accommodate 20 billion parameter models, while 70 billion parameter models can be managed by as few as four systems.

Additional model support coming soon

Cerebras emphasizes the use of 16-bit model weights to maintain accuracy, contrasting with some competitors who reduce weight precision to 8-bit, which can degrade performance. According to Cerebras, its 16-bit models perform up to 5% better in multi-turn conversations, math, and reasoning tasks compared to 8-bit models, ensuring more accurate and reliable outputs.

The Cerebras inference platform is available via chat and API access, and designed to be easily integrated by developers familiar with OpenAI’s Chat Completions format. The platform boasts the ability to run Llama3.1 70B models at 450 tokens per second, making it the only solution to achieve instantaneous speed for such large models. For developers, Cerebras is offering 1 million free tokens daily at launch, with pricing for large-scale deployments said to be significantly lower than popular GPU clouds.

Cerebras is initially launching with Llama3.1 8B and 70B models, with plans to add support for larger models like Llama3 405B and Mistral Large 2 in the near future. The company highlights that fast inference capabilities are crucial for enabling more complex AI workflows and enhancing real-time LLM intelligence, particularly in techniques like scaffolding, which requires substantial token usage.

Patrick Kennedy from ServeTheHome saw the product in action at the recent Hot Chips 2024 symposium, noting, “I had the opportunity to sit with Andrew Feldman (CEO of Cerebras) before the talk and he showed me the demos live. It is obscenely fast. The reason this matters is not just for human to prompt interaction. Instead, in a world of agents where computer AI agents talk to several other computer AI agents. Imagine if it takes seconds for each agent to come out with output, and there are multiple steps in that pipeline. If you think about automated AI agent pipelines, then you need fast inferencing to reduce the time for the entire chain.”

Cerebras positions its platform as setting a new standard in open LLM development and deployment, offering record-breaking performance, competitive pricing, and broad API access. You can try it out by going to inference.cerebras.ai or by scanning the QR code in the slide below.

Cerebras AI inference

(Image credit: Cerebras / Hot Chip)

More from TechRadar Pro

Wayne Williams
Editor

Wayne Williams is a freelancer writing news for TechRadar Pro. He has been writing about computers, technology, and the web for 30 years. In that time he wrote for most of the UK’s PC magazines, and launched, edited and published a number of them too.

Read more
Cerebras WSE-3
DeepSeek on steroids: Cerebras embraces controversial Chinese ChatGPT rival and promises 57x faster inference speeds
d-Matrix Corsair card
Tech startup proposes a novel way to tackle massive LLMs using the fastest memory available to mankind
Half man, half AI.
Yet another tech startup wants to topple Nvidia with 'orders of magnitude' better energy efficiency; Sagence AI bets on analog in-memory compute to deliver 666K tokens/s on Llama2-70B
SambaNova runs DeepSeek
Nvidia rival claims DeepSeek world record as it delivers industry-first performance with 95% fewer chips
A Corsair One i500 on a desk
Microsoft backed a tiny hardware startup that just launched its first AI processor that does inference without GPU or expensive HBM memory and a key Nvidia partner is collaborating with it
Nvidia H800 GPU
A look at the unbelievable Nvidia GPU that powers DeepSeek's AI global ambition
Latest in Pro
Isometric demonstrating multi-factor authentication using a mobile device.
NCSC gets influencers to sing the praises of 2FA
Sam Altman and OpenAI
OpenAI is upping its bug bounty rewards as security worries rise
Context Windows
Why are AI context windows important?
BERT
What is BERT, and why should we care?
A person holding out their hand with a digital AI symbol.
AI is booming — but are businesses seeing real impact?
A stylized depiction of a padlocked WiFi symbol sitting in the centre of an interlocking vault.
Dangerous new CoffeeLoader malware executes on your GPU to get past security tools
Latest in News
cheap Nintendo Switch game deals sales
Nintendo didn't anticipate that Mario Kart 8 Deluxe was 'going to be the juggernaut' for the Nintendo Switch when it was ported to the console, according to former employees
Three angles of the Apple MacBook Air 15-inch M4 laptop above a desk
Apple MacBook Air 15-inch (M4) review roundup – should you buy Apple's new lightweight laptop?
Witchbrook
Witchbrook, the life-sim I've been waiting years for, finally has a release window and it's sooner than you think
Amazon Echo Smart Speaker
Amazon is experimenting with renaming Echo speakers to Alexa speakers, and it's about time
Shigeru Miyamoto presents Nintendo Today app
Nintendo Today smartphone app is out now on iOS and Android devices – and here's what it does
iPhone 13 mini
The iPhone mini won't be returning, according to rumors – and you think that's a mistake