Deployment Case Study: SKA Array
The Square Kilometre Array (SKA) is the world’s largest radio astronomy project, spanning two continents: SKA-Mid in South Africa and SKA-Low in Western Australia. When fully built, it will generate some of the largest scientific data streams in history — hundreds of petabytes annually — requiring exascale-class data processing and global storage. The SKA exemplifies a science-driven data pipeline that rivals the scale and complexity of AI data centers.
Overview
- Locations: SKA-Mid (Karoo, South Africa) and SKA-Low (Murchison, Australia)
- Array Scale: ~200 dishes (South Africa), ~130,000 antennas (Australia)
- Data Scale: ~600 PB/year raw data streams
- Processing: Regional Science & Data Centres (RSDCs) in Europe, Asia, Africa, Australia
- Timeline: SKA1 construction underway, initial operations late 2020s; SKA2 planned for 2030s
Data Pipeline
| Stage | Location | Function | Notes |
|---|---|---|---|
| Signal Capture | Australia & South Africa sites | Raw radio signals collected by dishes and antennas | Extremely high bandwidth streams |
| Correlation & Filtering | On-site correlators (HPC racks) | Combine antenna/dish signals into usable data | Exascale-class compute near telescopes |
| Transmission | Dedicated fiber networks | Send processed data to national centers | Tbps-scale fiber requirements |
| Tier-0 Processing | Regional Science Data Centres (RSDCs) | Reduction, imaging, calibration pipelines | Exabyte storage distributed globally |
| Archival Storage | Global RSDCs (Europe, Asia, Africa, Australia) | ~600 PB/year curated for access | Accessible to international astronomy community |
HPC, Storage & Networking
- Compute: On-site correlators near antennas, exascale HPC for imaging and analysis.
- Storage: Hundreds of petabytes annually distributed across global Tier-1 RSDCs.
- Networking: Tbps-scale dedicated fiber from remote desert sites to HPC hubs.
- Real-Time Reduction: Raw signals reduced by factors of 10–100 before storage.
Partners & Stakeholders
| Partner | Role |
|---|---|
| SKAO (Square Kilometre Array Observatory) | Central management, coordination |
| South Africa | Host of SKA-Mid, Karoo site |
| Australia | Host of SKA-Low, Murchison site |
| International Partners | Europe, Asia, Americas — RSDCs & funding |
| CERN & Exascale HPC Partners | Technical collaboration on exascale compute pipelines |
Key Challenges
- Data Volume: Raw radio data exceeds exascale; requires real-time reduction to manageable sizes.
- Networking: Remote desert sites need Tbps-class fiber to global HPC centers.
- Power: Remote facilities must secure renewable + grid integration for continuous uptime.
- Distributed Collaboration: 15+ nations and multiple RSDCs coordinating processing and access.
- Longevity: Maintaining petabyte archives for decades.
Strategic Importance
- Cosmology: Map hydrogen distribution, study galaxy formation, probe dark energy.
- Technology: Pushes the envelope on distributed exascale data management.
- Global Infrastructure: One of the largest scientific collaborations ever, spanning continents.
- Cross-Domain Learnings: Techniques for streaming, reduction, and distributed storage mirror challenges in AI data centers.
FAQ
- How much data does SKA generate? ~600 PB/year raw data, reduced in real time before archival.
- Where is the data processed? At on-site correlators, then at Regional Science Data Centres worldwide.
- What’s unique about SKA? It spans two continents and produces radio astronomy data at an unprecedented scale.
- How is this like AI data centers? Both rely on exascale HPC, storage, and networking to manage petabyte-scale data streams.
- What’s next? SKA2 expansion in the 2030s, scaling to even larger data flows.