"EE HPC WG Liquid Cooling Controls Team Whitepaper June 11, 2017" This paper defines data inputs for dynamic controls to manage high performance computing (HPC) facility and IT control systems. Each input includes parameters about measurement frequency and accuracy that are within a rough order of magnitude, but not an absolute limit. Each input also includes information about whether it would typically be provided by the facility or by the HPC system or whether its provision would have to be negotiated. This document is intended to be a guideline for data inputs to consider when designing a liquid cooling control system. It is not a design specification. Each site will develop their specific design based on their specific situation.
TEAM CHARTER:
There are lessons learned and best practices evolving from implementing and operating supercomputer centers with complex infrastructure systems and the highly variable demands placed upon these systems with today's supercomputers. The Liquid Cooling Control Team initially focused on sharing designs, challenges and best practices for integrated control systems.
The team transitioned from this initial charter and started generating a list of data elements required for dynamic, integrated liquid cooling controls. The team is also collecting information on use cases to test and build support for the initial list of data elements. The results of this work will be captured in a whitepaper. It is also expected that the results of the whitepaper will be included in the EE HPC WG Energy Efficiency Considerations for HPC Procurement Documents.
CONTROLS DESIGN, CHALLENGES AND BEST PRACTICES REVIEW:
The team initially shared designs, challenges and best practices (see below).
DOCUMENTS:
CONTROLS ISSUES AND RECOMMENDED NEXT STEPS:
After sharing this information, the team compiled a list of all of the liquid cooling controls challenges, concerns, issues and opportunities that were identified both in the presentations and as a result of the review discussion. The team then synthesized this information and identified top problems and recommended next steps (see below).
Top problems with control systems:
Team Outcomes:
Focus:
CONTROLS HIGH-LEVEL GUIDELINE OUTLINE:
Transitioning to the generation of a whitepaper, the team has pursued two slightly differing approaches.
First, the team wrote an outline for the whitepaper. The outline was for a high-level Guideline of HPC and Data Center Controls Systems as well as an addendum for Sequence of Operations. The outline ended up to be a document that was 5 pages long. It was considered too broad for the team to embrace immediately and was tabled for future consideration.
DOCUMENT: 5 Page Outline (draft only)
CONTROLS DATA ELEMENTS:
Secondly, the team created a list of data elements deemed important for liquid cooling controls. These data elements are from both the IT systems and the data center building. This work is exploratory, as there are few implementations of dynamic integrated liquid cooling controls.
DOCUMENT: Please request latest copy from Natalie Bates
SYSTEM INTEGRATOR CONTROLS VISION AND ROAD MAP:
In order to test this list of Controls Data Elements, the team decided to both ask the system integrator vendor community to present their vision and road maps as well as to write case studies on the few sites that have implemented dynamic liquid cooling controls. Below is an excerpt from the invitation sent to the system integrators for a webinar where they would make their presentations.
Although there aren't any specific presentation format requirements, the expectation is that each presentation will address the some or all of the following areas of interest for the HPC centers. It is recommended that you start with a block diagram describing your liquid cooling and control technology.
Presentations were made by HP, Cray, Lenovo, IBM and RSC. Below is a summary of these presentations.
Today’s state of the practice is to use commercial Coolant Distribution Units (CDUs) for managing the delivery of liquid to the HPC system. Most of the CDUs deployed with today’s HPC systems are constant flow-rate and temperature. The customer can set (and change) inlet flow-rate and temperature as long as they stay within a specified envelope. This envelope is set based on a maximum specified heat removal. Some customers re-set these points on a seasonal basis and could re-set them as dew point changes. There are CDUs deployed at different levels of the system with the lowest level being the rack, but not the node level. These CDUs range in their intelligence, but at least one vendor claims to have intelligent CDUs with integrated controls (specifics on the control was limited).
Tomorrow’s products could be designed to allow for inlet flow-rate and temperature to vary based on actual, not maximum specified heat removal. There would have to be more and finer grained telemetry and controls.
There are many questions that were raised, some business and others technical.
These are some of the outstanding questions that we are hoping will be answered with more data and analysis moving forward.
DOCUMENTS:
CASE STUDIES:
This forum will be followed by another webinar where some supercomputing center members of the EE HPC WG will disclose their thoughts, plans and expectations for HPC liquid cooling controls requirements.
The team is hosting this webinar to encourage communication between system integrators and users regarding liquid cooling controls. They are also trying to encourage participation and especially looking for more examples of liquid cooling controls. Below is the plan for this webinar.
Introduction and Motivation:
CASE STUDIES:
David Martinez, SNL’s Sky Bridge
Tom Durbin, NCSA’s Blue Waters
Greg Rottman, ERDC
Torsten Wilde, LRZ’s Optimization Proposal
DOCUMENTS:
Lawrence Livermore National Laboratory
7000 East Avenue • Livermore, CA 94550
Operated by Lawrence Livermore National Security, LLC, for the Department of Energy's National Nuclear Security Administration
LLNL-WEB-670983 | Privacy & Legal Notice | Site Search
Energy Efficient HPC Working Group
Controls Team