Deployment Case Study: SKA Array
The Square Kilometre Array (SKA) is the world’s largest radio astronomy project, spanning two continents: SKA-Mid in South Africa and SKA-Low in Western Australia. When fully built, it will generate some of the largest scientific data streams in history — hundreds of petabytes annually — requiring exascale-class data processing and global storage. The SKA exemplifies a science-driven data pipeline that rivals the scale and complexity of AI data centers.
Overview
- Locations: SKA-Mid (Karoo, South Africa) and SKA-Low (Murchison, Australia)
- Array Scale: ~200 dishes (South Africa), ~130,000 antennas (Australia)
- Data Scale: ~600 PB/year raw data streams
- Processing: Regional Science & Data Centres (RSDCs) in Europe, Asia, Africa, Australia
- Timeline: SKA1 construction underway, initial operations late 2020s; SKA2 planned for 2030s
Data Pipeline
Stage |
Location |
Function |
Notes |
Signal Capture |
Australia & South Africa sites |
Raw radio signals collected by dishes and antennas |
Extremely high bandwidth streams |
Correlation & Filtering |
On-site correlators (HPC racks) |
Combine antenna/dish signals into usable data |
Exascale-class compute near telescopes |
Transmission |
Dedicated fiber networks |
Send processed data to national centers |
Tbps-scale fiber requirements |
Tier-0 Processing |
Regional Science Data Centres (RSDCs) |
Reduction, imaging, calibration pipelines |
Exabyte storage distributed globally |
Archival Storage |
Global RSDCs (Europe, Asia, Africa, Australia) |
~600 PB/year curated for access |
Accessible to international astronomy community |
HPC, Storage & Networking
- Compute: On-site correlators near antennas, exascale HPC for imaging and analysis.
- Storage: Hundreds of petabytes annually distributed across global Tier-1 RSDCs.
- Networking: Tbps-scale dedicated fiber from remote desert sites to HPC hubs.
- Real-Time Reduction: Raw signals reduced by factors of 10–100 before storage.
Partners & Stakeholders
Partner |
Role |
SKAO (Square Kilometre Array Observatory) |
Central management, coordination |
South Africa |
Host of SKA-Mid, Karoo site |
Australia |
Host of SKA-Low, Murchison site |
International Partners |
Europe, Asia, Americas — RSDCs & funding |
CERN & Exascale HPC Partners |
Technical collaboration on exascale compute pipelines |
Key Challenges
- Data Volume: Raw radio data exceeds exascale; requires real-time reduction to manageable sizes.
- Networking: Remote desert sites need Tbps-class fiber to global HPC centers.
- Power: Remote facilities must secure renewable + grid integration for continuous uptime.
- Distributed Collaboration: 15+ nations and multiple RSDCs coordinating processing and access.
- Longevity: Maintaining petabyte archives for decades.
Strategic Importance
- Cosmology: Map hydrogen distribution, study galaxy formation, probe dark energy.
- Technology: Pushes the envelope on distributed exascale data management.
- Global Infrastructure: One of the largest scientific collaborations ever, spanning continents.
- Cross-Domain Learnings: Techniques for streaming, reduction, and distributed storage mirror challenges in AI data centers.
FAQ
- How much data does SKA generate? ~600 PB/year raw data, reduced in real time before archival.
- Where is the data processed? At on-site correlators, then at Regional Science Data Centres worldwide.
- What’s unique about SKA? It spans two continents and produces radio astronomy data at an unprecedented scale.
- How is this like AI data centers? Both rely on exascale HPC, storage, and networking to manage petabyte-scale data streams.
- What’s next? SKA2 expansion in the 2030s, scaling to even larger data flows.