Deployment Case Study:
Oak Ridge National Laboratory


The Oak Ridge Leadership Computing Facility (OLCF) at Oak Ridge National Laboratory (ORNL) in Tennessee is home to Frontier, the world’s first exascale-class supercomputer. Achieving 1.68 exaflops peak performance, Frontier set a global benchmark when it debuted in 2022. The system represents the U.S. Department of Energy’s (DOE) flagship HPC facility, advancing climate research, nuclear science, energy systems, and AI training at a scale previously unattainable.


Overview

  • Location: Oak Ridge, Tennessee, USA
  • Operator: DOE Oak Ridge National Laboratory (ORNL)
  • System: Frontier (Cray EX, AMD CPU + GPU architecture)
  • Performance: 1.68 exaflops peak, ~1.1 exaflops sustained (LINPACK)
  • Scale: ~9,400 nodes, 37,000 GPUs, 8.7 million cores
  • Energy Demand: ~21 MW continuous draw
  • Commissioned: 2022

System Architecture

Component Details
CPU AMD EPYC 7A53 "Trento" (custom HPC variant)
GPU AMD Instinct MI250X accelerators
Node Count 9,472 nodes, each with 1 CPU + 4 GPUs
Memory 700+ TB system memory (HBM + DDR4)
Interconnect Cray Slingshot high-speed fabric
Performance 1.68 EF peak, ~1.1 EF sustained
Power ~21 MW facility draw

Strategic Significance

  • First Exascale System: Frontier is the first supercomputer to surpass 1 exaflop, marking a milestone in global HPC.
  • Scientific Leadership: Enables breakthroughs in energy, materials, climate, and nuclear research.
  • AI Integration: Supports hybrid HPC + AI workloads, including exascale-scale model training and simulation fusion.
  • National Security: Strengthens DOE’s leadership in computational science with direct implications for U.S. competitiveness.

Energy & Cooling

  • Power Demand: ~21 MW continuous draw, equivalent to a mid-sized industrial facility.
  • Cooling: Warm-water liquid cooling system designed to maximize efficiency.
  • PUE: Among the lowest of any exascale facility, reflecting DOE sustainability mandates.

Key Challenges

  • Energy Consumption: Sustaining 21 MW reliably from Tennessee Valley Authority (TVA) grid.
  • Scale of Parallelism: Coordinating 8.7 million cores + 37,000 GPUs for efficient workloads.
  • Hybrid Workloads: Balancing HPC simulations with next-gen AI model training pipelines.
  • Longevity: Ensuring Frontier’s competitiveness as Aurora (ANL) and El Capitan (LLNL) come online.

Future Outlook

  • 2025–2030: Frontier expected to remain a top-5 supercomputer globally, even as Aurora and El Capitan surpass in raw performance.
  • AI Fusion: Increasing use of Frontier for fusion HPC + AI workloads (climate, fusion, biomedical).
  • Upgrade Path: Potential expansion of AMD Instinct roadmap or integration with new interconnects.

FAQ

  • How powerful is Frontier? 1.68 exaflops peak — the first exascale machine globally.
  • What is it used for? Climate modeling, nuclear simulations, materials science, AI training, and energy research.
  • How much power does it use? ~21 MW continuously, equivalent to tens of thousands of homes.
  • Why is it important? It’s a symbol of U.S. leadership in HPC, bridging science, national security, and AI.
  • Who built it? HPE Cray, with AMD CPUs/GPUs, deployed at ORNL.