DataCentersX > Stack > Cooling and Thermal Management > Liquid Cooling


Data Center Liquid Cooling


Liquid cooling is the category of thermal management that uses water (or water-glycol) as the heat-transport fluid between servers and the facility plant. It is the umbrella that contains direct-to-chip cooling, rear-door heat exchangers, and, by some taxonomies, single-phase immersion. The category exists because air's volumetric heat capacity runs out around 30 kilowatts per rack, and every AI training rack above that threshold has to move heat in something denser than air.

The engineering advantage of water over air is not subtle. Water carries roughly four thousand times the heat per unit volume, which means a pipe the diameter of a garden hose can remove the thermal load that would require a room-scale air duct. This collapses the volume of mechanical infrastructure, collapses fan power, and pushes supply temperatures toward the thermal ceiling of the silicon rather than the ceiling of the air handler.


The two-loop architecture

Liquid cooling in a data center is organized around two distinct water loops separated by a heat exchanger. The primary isolation boundary is the Coolant Distribution Unit, and everything else in the category is infrastructure that supports, distributes, or monitors those loops.

Loop Water Quality Path Function
Facility Water Loop (FWL) Treated facility water; higher mineral content acceptable Cooling tower or chiller plant to CDU primary side Carries heat from the hall to the mechanical plant for rejection
Technology Cooling System (TCS) Ultrapure or tightly controlled; low conductivity and particulate CDU secondary side to manifolds to cold plates or rear doors Carries heat from silicon and rack hardware into the CDU

The separation matters for two reasons. First, the TCS runs directly through microchannel cold plates where any scaling, corrosion, or biofouling would raise thermal resistance and eventually block flow, so TCS water chemistry is held to semiconductor-grade tolerances. Second, the FWL runs through open cooling towers and carries whatever chemistry the local water source delivers, which would destroy a cold plate in weeks. The CDU heat exchanger enforces the boundary: heat crosses, water does not.


The Coolant Distribution Unit

The CDU is the defining component of the category. It sits at the boundary between the two loops and performs three functions: heat exchange between FWL and TCS, circulation of the TCS through pumps, and control of TCS supply temperature and flow. CDUs also host the instrumentation for leak detection, flow balancing, and filtration on the TCS side.

CDU Form Factor Typical Capacity Placement Typical Use
In-rack CDU Up to ~100 kW Inside a standard rack, serving its own hardware Pilot deployments, small clusters, enterprise HPC
In-row (sidecar) CDU 200 kW to 2 MW Dedicated cabinet within the row, serving adjacent racks Mainstream hyperscale and AI training deployments
Row-level manifold CDU 1 to 3 MW End of row, feeding many racks through row manifold Dense AI clusters with standardized row designs
Facility-scale CDU 2 to 10+ MW Mechanical room or dedicated mechanical floor Frontier AI sites with hundreds of liquid-cooled racks

The trend at frontier AI sites is toward larger, centralized CDUs feeding row manifolds rather than per-rack or per-row units. The engineering reason is redundancy: a single 3 MW CDU with N+1 pump and heat exchanger redundancy is easier to operate and more reliable than dozens of smaller in-row units spread across a hall.


Distribution infrastructure

Between the CDU and the rack, the TCS passes through distribution hardware that splits, routes, and connects the flow to individual servers or cold plates. Four component classes define this layer.

Row and rack manifolds split the TCS supply across multiple racks or multiple servers within a rack. They carry quick-disconnect fittings at every tap point to allow hot service.

Quick-disconnect fittings (QDs) seal automatically when uncoupled, allowing a cold plate, cold-plate sled, or entire server to be removed from the loop without draining or shutting down adjacent hardware. QD design is a quiet engineering bottleneck: drip rate, insertion force, cycle life, and material compatibility all matter, and the industry has converged on a small number of supplier designs.

Cold plates are the terminal devices that contact silicon packages and extract heat into the TCS. Cold plates are covered in detail on the Direct-to-Chip Cooling page.

Rear-door heat exchangers are water-cooled coils mounted on the back of an otherwise air-cooled rack. Hot exhaust air passes through the coil and transfers heat to the facility water loop before leaving the rack. RDHx units let operators raise effective rack density into the 40 to 50 kilowatt range without converting servers to cold-plate cooling, and they handle the residual air-cooled heat (memory, power delivery, networking) on racks that do use cold plates but not for every component.


Water chemistry and serviceability

The two loops impose very different chemistry regimes. The FWL tolerates municipal or tower water with conventional treatment: biocide dosing to suppress microbial growth, scale inhibitor to manage mineral deposition, corrosion inhibitor for the metallurgy of pipes and heat exchangers, and periodic blowdown to prevent concentration buildup as evaporation in the cooling tower removes pure water and leaves solutes behind.

The TCS is a different discipline. Conductivity is held below a few microsiemens per centimeter to prevent galvanic corrosion in mixed-metal systems (copper cold plates, brass manifolds, stainless fittings). pH is tightly bounded. Particulate filtration runs continuously at sub-micron levels because any particle larger than the microchannel diameter in a cold plate will lodge and raise thermal resistance. Some operators run glycol mixtures for freeze protection in climates where TCS piping runs through unconditioned space.

Serviceability is the operational cost of liquid infrastructure. QDs and leak detection make it possible to remove a server from a live loop, but the discipline required to do so cleanly is different from hot-aisle hardware swap in an air-cooled hall. Facility staff need liquid-cooling-specific training, and the spare parts chain has to include cold plates, manifolds, QDs, and filter cartridges in addition to the traditional fan-and-drive inventory.


Where liquid cooling fits in the density spectrum

Liquid cooling as a category spans the density range from roughly 30 kilowatts per rack, where rear-door exchangers start to make economic sense, up to 250 kilowatts per rack and beyond, where direct-to-chip dominates. Above that, immersion becomes a credible alternative, though in current AI deployments direct-to-chip continues to absorb the density envelope through progressively more aggressive cold-plate designs.

The economic crossover against air is driven by rack density, utilization, and local energy cost. At AI training densities (120 kilowatts per rack and up), liquid wins on total cost of ownership within two to three years of operation. At enterprise densities (15 to 25 kilowatts per rack), the crossover may never arrive and air remains correct.


Related coverage

Cooling and Thermal Management | Direct-to-Chip Cooling | Immersion Cooling | HVAC and Air Handling | UPW and Cooling Water Systems | Cooling Tower and Heat Rejection | Rack Layer | Cluster Layer | Cooling Monitoring