Evolving data center cooling for AI workloads

Racks of servers inside a data center.
(Image credit: Future)

In today's rapidly transforming technological landscape, artificial intelligence (AI) is driving a surge in demand for high performance computing solutions. However, AI applications, leveraging machine learning (ML) and deep learning algorithms, require immense computational power to process vast datasets and execute complex tasks - computational intensity which can result in substantial heat generation within the data center. 

Traditional air-cooled systems often struggle to dissipate the heat density associated with AI workloads, and innovative liquid cooling technologies are becoming indispensable. Liquid cooling involves submerging hardware components in a dielectric fluid or delivering coolant directly to heat-generating parts, effectively managing heat and enhancing performance and reliability for AI tools and similar environments.

David Watkins

Solutions Director at VIRTUS Data Centres.

What Types of Liquid Cooling are Available?

Flexibility is key in cooling solutions, and it’s important to know the different options available in the liquid cooling realm:

1. Immersion Cooling: This innovative method involves fully submerging specialized IT hardware, such as servers and graphics processing units (GPUs), in a dielectric fluid like mineral oil or synthetic coolant within a sealed enclosure. Unlike traditional air-cooled systems that rely on circulating air to dissipate heat, immersion cooling directly immerses hardware in a fluid that efficiently absorbs heat. This direct contact allows for superior heat dissipation, reducing hot spots and thermal inefficiencies associated with air cooling. Immersion cooling not only enhances energy efficiency by eliminating the need for energy-intensive air conditioning but also reduces operational costs over time.

Moreover, it enables data centers to achieve higher density configurations by compactly arranging hardware without the spatial constraints imposed by air-cooled systems. By optimizing both space and energy utilization, immersion cooling is particularly well-suited for meeting the intense computational demands of AI workloads while ensuring reliable performance and scalability.

2. Direct-to-Chip Cooling: Also known as microfluidic cooling, this approach delivers a coolant directly to heat-generating components such as central processing units (CPUs) and GPUs at the micro-level.

Unlike immersion cooling, which submerges entire hardware units, direct-to-chip cooling focuses on cooling specific hot spots within individual processors. This targeted cooling method maximizes thermal conductivity, efficiently transferring heat away from critical components where it is generated most intensely. By mitigating thermal bottlenecks and reducing the risk of performance degradation due to overheating, direct-to-chip cooling enhances the overall reliability and lifespan of AI applications in data center environments. This precision cooling approach is essential for maintaining optimal operating temperatures and ensuring consistent performance under high computational loads.

The versatility of liquid cooling technologies offers data center operators the flexibility to adopt a multi-faceted approach tailored to their infrastructure and AI workload requirements. Different cooling technologies have unique strengths and limitations, and providers can combine immersion cooling, direct-to-chip cooling, and air cooling to achieve optimal efficiency across different components and workload types.

As AI workloads evolve, data centers must accommodate increasing computational demands while maintaining efficient heat dissipation. Integrating multiple cooling technologies provides scalability options and supports future upgrades without compromising performance or reliability.

Challenges and Innovations in Liquid Cooling

Whilst innovative liquid cooling technologies promise to address the challenges posed by AI workloads, adoption presents hurdles such as initial investment costs and system complexity. Compared with traditional air-based solutions, liquid cooling systems require specialized components and careful integration into existing data center infrastructure. Retrofitting older facilities can be costly and complex, whereas new data centers can be designed to support AI workloads from inception.

Scalability remains a critical consideration. Data centers must adapt cooling systems to meet evolving workload requirements without sacrificing efficiency or reliability. Liquid cooling offers potential energy savings compared to air cooling, contributing to sustainability efforts by reducing overall facility energy consumption.

Choosing the Right Partner for Liquid Cooling Solutions

Selecting a reliable partner or vendor for liquid cooling solutions is crucial for ensuring successful integration and optimal performance in data center environments. Key considerations include:

1. Expertise and Experience: Look for vendors with a proven track record in designing, implementing, and maintaining liquid cooling systems specifically tailored for High Performance Computing (HPC) and/or AI workloads. Experience in similar deployments can provide valuable insights and mitigate potential challenges.

2. Customization and Scalability: Evaluate vendors that offer customizable solutions capable of scaling with your data center's evolving needs. A flexible approach to cooling infrastructure is essential to accommodate future expansions and technological advancements in AI.

3. Support and Service: Assess the level of support and service offered by potential vendors. Reliable technical support and proactive maintenance are critical to minimizing downtime and ensuring continuous operation of AI applications.

4. Sustainability and Efficiency: Consider vendors committed to sustainability practices, such as energy-efficient cooling technologies and environmentally responsible coolant options. These factors contribute to reducing operational costs and minimizing environmental impact.

5. Collaborative Partnership: Seek vendors who prioritize collaboration and partnership. A cooperative approach fosters innovation and ensures alignment with your data center's long-term goals and strategic initiatives.

By partnering with the right vendor for liquid cooling solutions, data centre operators can effectively manage the thermal challenges posed by AI workloads while optimizing performance, reliability, and sustainability.

Looking Ahead

Innovation is key to unlocking the full potential of liquid cooling for AI workloads in data centers. Collaborative partnerships with technology vendors and research institutions drive efficiency improvements and enable the development of customized cooling solutions tailored to the specific needs of AI applications.

We list the best colocation providers.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

David Watkins, Solutions Director at VIRTUS Data Centres.

Read more
A person standing in front of a rack of servers inside a data center
Direct-to-chip, single-phase and dual-phase cooling explained
Racks of servers inside a data center.
Cooling high-density data centers with coolant distribution units
Data center racks with cables and servers
What data centers should consider to establish more sustainable operations
Data center racks with cables and servers
Data centers are being pushed to their limits, but digital twins could help
Castrol ON Immersion Cooling Fluids
Fluid-as-a-service? No, it's not what you think - F1 stalwart is quietly innovating to bring its expertise in cooling to data centers and beyond
Cloud computing graphics.
4 key trends redefining the IT landscape
Latest in Pro
ai quantization
Shadow AI: the hidden risk of operational chaos
Digital clouds against a blue background.
Navigating the growing complexities of the cloud
Zendesk Relate 2025
Zendesk Relate 2025 - everything you need to know as the event unfolds
Microsoft
"Another pair of eyes" - Microsoft launches all-new Security Copilot Agents to give security teams the upper hand
Lock on Laptop Screen
Medusa ransomware is able to disable anti-malware tools, so be on your guard
AI quantization
What is AI quantization?
Latest in News
Nikon Z5
The Nikon Z5 II could land soon – here's what to expect from Nikon's rumored entry-level full-frame camera
Google Pixel Watch 3
Google Pixel Watches hit with delayed notifications, crashing, and performance issues following Wear OS 5.1 update
Zendesk Relate 2025
Zendesk Relate 2025 - everything you need to know as the event unfolds
Disney Plus logo with popcorn
You can finally tell Disney+ to stop bugging you about that terrible Marvel show you regret starting
Google Gemini AI
Gemini can now see your screen and judge your tabs
Girl wearing Meta Quest 3 headset interacting with a jungle playset
Latest Meta Quest 3 software beta teases a major design overhaul and VR screen sharing – and I need these updates now