Microsoft deliberately chose to use old tech for its Nvidia GPU rival — Maia 100 AI accelerator uses HBM2E memory and the mysterious ability to 'unlock new capabilities' via firmware update

(Image credit: Microsoft)

At the recent Hot Chips 2024 symposium, Microsoft revealed details about its first-generation custom AI accelerator, the Maia 100, designed for large-scale AI workloads on its Azure platform.

Unlike its rivals, Microsoft has opted for older HBM2E memory technology, integrated with the intriguing ability to "unlock new capabilities" via firmware updates. This decision appears to be a strategic move to balance performance and cost efficiency.

The Maia 100 accelerator is a reticle-size SoC, built on TSMC’s N5 process and featuring a COWOS-S interposer. It includes four HBM2E memory dies, delivering 1.8TBps bandwidth and 64GB capacity, tailored for high-throughput AI workloads. The chip is designed to support up to 700W TDP but is provisioned at 500W, making it energy-efficient for its class.

"Not as capable as a Nvidia H100"

Microsoft's approach with Maia 100 emphasizes a vertically integrated architecture, from custom server boards to specialized racks and a software stack designed to enhance AI capabilities. The architecture includes a high-speed tensor unit and a custom vector processor, supporting various data formats and optimized for machine learning needs.

Additionally, the Maia 100 supports Ethernet-based interconnects with up to 4800Gbps all-gather and scatter-reduced bandwidth, using a custom RoCE-like protocol for reliable, secure data transmission.

Patrick Kennedy from ServeTheHome reported on Maia at Hot Chips, noting, “It was really interesting that this is a 500W/ 700W device with 64GB of HBM2E. One would expect it to be not as capable as a Nvidia H100 since it has less HBM capacity. At the same time, it is using a good amount of power. In today’s power-constrained world, it feels like Microsoft must be able to make these a lot less expensive than Nvidia GPUs.”

The Maia SDK simplifies deployment by allowing developers to port their models with minimal code changes, supporting both PyTorch and Triton programming models. This enables developers to optimize workload performance across different hardware backends without sacrificing efficiency.

More from TechRadar Pro

TOPICS

Wayne Williams is a freelancer writing news for TechRadar Pro. He has been writing about computers, technology, and the web for 30 years. In that time he wrote for most of the UK’s PC magazines, and launched, edited and published a number of them too.

"Not as capable as a Nvidia H100"

Are you a pro? Subscribe to our newsletter

More from TechRadar Pro