Artificial Intelligence (AI) and High Performance Computing (HPC) have emerged as key areas of opportunity for innovation and business transformation.
The challenge for IT leaders is to enable high-density workloads with the right IT infrastructure, and increasingly the community is debating advanced cooling technologies like liquid cooling.
While Direct Liquid Cooling (DLC) is being deployed in more data centers today than ever before, would you be surprised to know that we've been deploying it in our data center designs since 2015 in Digital Reality? Did you also know that liquid cooling isn't always the right choice for every high-density AI or HPC workload?
In this post, I'll cover the basics of data center cooling needs for high-density workloads like AI and HPC, and how digital reality's legacy of innovation has us in demand for all kinds of advanced cooling techniques. Designed for acceleration, including liquid cooling.
I'll also share case studies from my innovation journey that show how enabling innovation is about the right strategy and the right partners, rather than a one-size-fits-all approach. .
Cooling requirements of high-density workloads
The density of an AI or HPC deployment determines its unique cooling requirements.
Power density requirements for AI and HPC can be 5-10 times higher than other data center use cases. Typical workloads are in the range of 5-8 kW per rack.
It is likely that some computing hardware may enable power densities in excess of 100 kW/rack and that peak densities in the data center may reach 150 kW/rack in the next two years.
Conventional workload densities can be air-cooled, however, by and large, most AI and HPC workflows require specialized cooling such as direct liquid cooling (DLC), air-assisted liquid cooling (AALC). ), or back-door heat exchangers.
Not all AI and HPC workloads require liquid cooling.
Liquid cooling requirements vary by hardware vendor, specific hardware and type of workload. Liquid cooling is not suitable for all hardware or every scenario.
Even in the AI era, not every rack will be drawing 100 kW, and may not even demand special advanced cooling.
For example, predictive deployments are less power hungry than training deployments and may be able to be cooled with conventional air-cooling techniques. Machine learning requires fewer resources, while deep learning and creative AI require larger environments due to their complexity.
It is important for IT leaders to understand that different AI and HPC workloads have different cooling requirements and that not every data center partner will have the specialized knowledge or infrastructure capabilities to enable the technology.
Every deployment's requirements will be different, so it's important to work with a partner that can tailor solutions and not rely on a one-size-fits-all approach. That's why advanced cooling with Digital Reality's data center design expertise makes a difference for our customers.
Innovation strategies
Digital Reality's global data center platform, Platform Digital®was chosen to be home to many groundbreaking AI and HPC workloads.
We've learned that to enable innovation, a few key strategies help us not only keep pace with technology, but stay one step ahead.
Enabling IT strategies to support AI and HPC workflows must:
- Agility
- Scale
- Sustainable development
These case studies from our own innovation journey over the last decade put these strategies into practice. They also show how our expertise and innovative strategies help us identify the right solution for the situation rather than relying on a one-size-fits-all approach.
Innovation case studies
Enable Scale: A high-capacity commercial engine with liquid cooling
2015 was a transformative year for us at Digital Reality. It was also my first year with the company. We embarked on an ambitious project to build a global financial services company specializing in algorithmic high-frequency trading.
A key part of the project was to move advanced liquid cooling from traditional air cooling to the chip level to support HPC clusters. This engineering feat not only increased the efficiency of the cooling system, but also meant that we were able to scale our technology to continue to support our client as their deployment reached nearly 6 MW.
Investing in next-generation liquid cooling technology was a decision we knew would enable our customer to move beyond their immediate needs and build a capability with a focus on long-term scalability and sustainability. will
Enabling Sustainable Development: Supercomputing with Adaptive Design
Recently, we partnered with a European customer to develop a sophisticated supercomputer environment with 70 kW per rack in a mixed environment. The user needs to rapidly deploy while complying with new stability regulations.
Waiting 3-5 years to build a new data center was not an option, which is why our ability to retrofit existing facilities moves customers faster. With the energy efficient facility we built in 2013, we were able to meet their demanding requirements for high power density and connectivity with minimal changes to our facility. This enabled 400% faster deployment.1
Our customer estimated a 30% improvement in energy efficiency by switching to liquid cooling.1 They also leveraged Digital Reality's Aquifer Thermal Energy Storage (ATES) cooling system and fully renewable energy sources to meet CO2 targets set by local sustainability regulations.
Our ability to develop retrofit designs reflects our commitment to both modern and agile design that enables sustainable, and timely development. Our design principles ensure that our infrastructure will not only meet current needs, but also decades into the future.
Enable Agility: A flexible, future-proof AI deployment
Today, we are playing a key role in the development of generative AI (GenAI). We are working with a customer who is integrating more than 30,000 state-of-the-art GPUs into a large platform.
To enable higher computing performance, the deployment requires that each GPU be connected to a single computing cluster. They needed a data center platform provider that could help them rapidly deploy to get value from their GPU investment, which was even more difficult given their special design requirements.
Our investment strategy is aimed at anticipating future demand, which has enabled us to match them with a facility that was built with a shell-ready design. Our agile, modular design approach enabled us to solve their complex design challenges while retaining 99% of the original design, which means we can build quickly.
Our agile approach will enable them to deploy in 12 months instead of the 36 months they would need with a custom build.1 Our customers' needs are changing rapidly, as are the technologies and solutions to meet them — so agility needs to be a core strategy to enable innovation.
Although this is the definition of a modern AI workload, direct liquid cooling was not the best choice for cooling. This is a good example of why a one-size-fits-all approach to cooling high-density workloads doesn't work.
Beyond infrastructure: Fostering a culture of innovation
In order to implement these innovation strategies, another important factor is your people team. For all IT leaders, it's important to remember that our successes aren't just about infrastructure: they're about the culture of innovation we've fostered.
At Digital Reality, our talented teams bring a legacy of innovation and engineering that has earned us multiple awards as trailblazers in the data center space.
Our culture of innovation in digital reality enables alignment with our customers, ensuring our partners are comfortable growing with digital reality into the future.
A vision for the future
My role as Chief Technology Officer at Digital Reality is to understand the technological needs of our customers and ensure that Digital Reality can meet those needs, not just for today, but for tomorrow.
As we look to the future, we are dedicated to not only participating in the technological landscape, but actively shaping it. Our mission is to enable our customers' innovation by enabling agility, scale, and sustainable growth.
Sustainability is particularly important to us. We continue to expand our coverage of carbon-free and renewable energy sources in line with customer demand – we have more than 1 gigawatt of solar and wind power under contract – and we have added alternative fuel secondary power. Solutions have started to be used to reduce further. The lifecycle carbon footprint of our data centers.
We will focus on applying the best technology in time to meet our customer's needs, rather than maintaining the status quo in wholesale and forcing tomorrow's customers to accept tomorrow's limitations. . It is this approach that has enabled Digital Reality to provide the examples featured in this post, as well as other consumer needs around the world.
Our adaptability, innovative spirit, and rich heritage make us a unique and sustainable company in the evolving world of technology.
Building a legacy of innovation doesn't happen overnight, but at Digital Reality we've learned that when we stay true to our values and focus on how we can best serve our customers' needs So we are always moving in the right direction.
Join us at Digital Reality as we continue to define the future of technology.. Be innovative, reach out to us, and let's deploy AI and HPC in a way that transforms your organization.
learn more About AI-ready data center infrastructure:
1 Expected results for the customer compared to the existing infrastructure before the platform is deployed and connected to DigiTAL or alternative solutions available at the time of purchase.