Data Center Server Cluster Layer
The pod or cluster layer scales beyond individual racks to form tightly coupled compute units. This is the level at which large-scale AI training, HPC simulations, and cloud workloads are orchestrated. Pods integrate dozens of racks with high-bandwidth fabrics, shared storage, and liquid cooling distribution. They represent the true building block of an AI factory, enabling workloads that exceed the capabilities of any single rack.
Architecture & Design Trends
- High-Bandwidth Fabrics: Clusters rely on InfiniBand HDR/NDR and Ethernet 400G/800G fabrics to link racks into low-latency domains.
- Memory Pooling: CXL-based switches enable pooled memory accessible across servers in multiple racks.
- Parallel Storage: Cluster-wide storage systems (Lustre, GPFS, BeeGFS) ensure data keeps up with AI model throughput demands.
- Liquid Distribution: Coolant Distribution Units (CDUs) and Manifold Distribution Units (MDUs) balance liquid flow across dozens of racks.
- Prefabrication: Modular containerized pods and MEP skids are delivered as factory-assembled units to accelerate deployment.
- Software Orchestration: Workload managers like Slurm, Kubernetes, and Ray orchestrate compute across the cluster fabric.
AI Training vs General-Purpose Clusters
Dimension |
AI Training Clusters |
General-Purpose Clusters |
Primary Workload |
AI training, LLMs, HPC simulations |
Cloud hosting, virtualization, enterprise IT |
Compute |
GPU-dense racks (1000s of GPUs) |
CPU-dominated racks with mixed VMs |
Networking |
400–800G Ethernet, InfiniBand NDR, optical fabrics |
10–100G Ethernet, basic spine/leaf |
Storage |
Parallel FS delivering TB/s bandwidth |
SAN/NAS for enterprise workloads |
Cooling |
Cluster-level CDUs, liquid loops |
Air cooling, limited liquid assistance |
Power |
Redundant UPS and high-capacity busbars |
Standard UPS, lower kW per rack |
Scale |
100s–1000s of nodes optimized for AI |
10s–100s of nodes optimized for IT |
Cost |
$50M–$500M+ per large AI cluster |
$1M–$10M typical enterprise cluster |
Notable Vendors
Vendor |
Product / Platform |
Cluster Form Factor |
Key Features |
NVIDIA |
DGX SuperPOD |
Factory-integrated AI cluster |
Up to 1000+ GPUs, InfiniBand NDR, liquid-cooled |
AMD |
MI300X Supercluster reference designs |
GPU-centric clusters |
Infinity Fabric, CXL memory expansion |
Intel |
Gaudi2 Cluster Kits |
Rack-scale clusters |
AI accelerator clusters with integrated networking |
HPE Cray |
EX Supercomputing System |
Cluster / supercomputer |
Optimized for HPC + AI hybrid workloads |
Dell Technologies |
AI Factory Clusters |
Rack-integrated solutions |
XE9680 racks combined into turnkey AI clusters |
Supermicro |
AI SuperCluster Solutions |
Rack-scale clusters |
Prefabricated GPU racks + liquid distribution |
Inspur |
AIStation / NF5688M6 clusters |
GPU superclusters |
China’s largest AI training cluster supplier |
Cluster BOM
Domain |
Examples |
Role |
Compute |
Dozens–hundreds of GPU/CPU racks |
Aggregates into large-scale compute domains |
Memory |
CXL switches, pooled memory fabrics |
Shared memory across multiple racks |
Storage |
Parallel FS (Lustre, GPFS, BeeGFS), NVMe-oF arrays |
Delivers high-throughput, low-latency data access |
Networking |
Spine switches, InfiniBand HDR/NDR, Ethernet 400/800G, optical interconnects |
Provides high-bandwidth cluster fabric |
Power |
Cluster-level busbars, redundant UPS feeds |
Ensures resilient power delivery across racks |
Cooling |
CDUs, MDUs, secondary liquid loops |
Balances coolant flow across multiple racks |
Orchestration |
Kubernetes, Slurm, Ray, integrated DCIM hooks |
Schedules workloads across nodes and racks |
Monitoring & Security |
Telemetry systems, IDS/IPS, access zones |
Provides cluster-wide visibility and protection |
Prefabrication |
Containerized pods, prefabricated MEP skids |
Accelerates deployment and standardizes clusters |
Key Challenges
- Networking Bottlenecks: Even with 400–800G fabrics, east–west traffic within AI training clusters stresses interconnects.
- Storage Throughput: Parallel file systems must deliver terabytes/sec bandwidth to avoid starving GPUs.
- Cooling Distribution: Balancing coolant across racks requires advanced CDUs/MDUs and leak detection systems.
- Power Coordination: UPS and redundant feeds must scale consistently across dozens of racks.
- Software Complexity: Orchestrating thousands of GPUs across racks introduces scheduling and failure domain challenges.
Future Outlook
- Optical Interconnects: Silicon photonics will dominate cluster fabrics by late 2020s to reduce latency and heat.
- Memory Disaggregation: Pooled CXL memory will become standard in AI clusters, reducing stranded resources.
- Composable Infrastructure: Dynamic allocation of compute, memory, and storage will make clusters more flexible.
- Liquid Cooling Expansion: Expect CDUs and MDUs to be mandatory for all AI training clusters within a few years.
- Standardization: OCP-inspired reference architectures will drive consistency across hyperscalers.
FAQ
- What is a pod in data center design? A pod is a modular group of racks, often prefabricated, that forms the building block of larger clusters.
- How many racks are in a typical AI cluster? Anywhere from 16 to 256+ racks depending on workload scale.
- What differentiates an AI cluster from an HPC cluster? HPC clusters focus on scientific simulations; AI clusters are optimized for GPU scaling and model training.
- Are AI clusters prefabricated? Increasingly yes—vendors deliver containerized pods or rack-scale systems to reduce deployment time.
- What orchestration software is used? Slurm, Kubernetes, Ray, and vendor-specific platforms like NVIDIA Base Command manage workloads.