Deployment Case Study: SKA Array


The Square Kilometre Array (SKA) is the world’s largest radio astronomy project, spanning two continents: SKA-Mid in South Africa and SKA-Low in Western Australia. When fully built, it will generate some of the largest scientific data streams in history — hundreds of petabytes annually — requiring exascale-class data processing and global storage. The SKA exemplifies a science-driven data pipeline that rivals the scale and complexity of AI data centers.


Overview

  • Locations: SKA-Mid (Karoo, South Africa) and SKA-Low (Murchison, Australia)
  • Array Scale: ~200 dishes (South Africa), ~130,000 antennas (Australia)
  • Data Scale: ~600 PB/year raw data streams
  • Processing: Regional Science & Data Centres (RSDCs) in Europe, Asia, Africa, Australia
  • Timeline: SKA1 construction underway, initial operations late 2020s; SKA2 planned for 2030s

Data Pipeline

Stage Location Function Notes
Signal Capture Australia & South Africa sites Raw radio signals collected by dishes and antennas Extremely high bandwidth streams
Correlation & Filtering On-site correlators (HPC racks) Combine antenna/dish signals into usable data Exascale-class compute near telescopes
Transmission Dedicated fiber networks Send processed data to national centers Tbps-scale fiber requirements
Tier-0 Processing Regional Science Data Centres (RSDCs) Reduction, imaging, calibration pipelines Exabyte storage distributed globally
Archival Storage Global RSDCs (Europe, Asia, Africa, Australia) ~600 PB/year curated for access Accessible to international astronomy community

HPC, Storage & Networking

  • Compute: On-site correlators near antennas, exascale HPC for imaging and analysis.
  • Storage: Hundreds of petabytes annually distributed across global Tier-1 RSDCs.
  • Networking: Tbps-scale dedicated fiber from remote desert sites to HPC hubs.
  • Real-Time Reduction: Raw signals reduced by factors of 10–100 before storage.

Partners & Stakeholders

Partner Role
SKAO (Square Kilometre Array Observatory) Central management, coordination
South Africa Host of SKA-Mid, Karoo site
Australia Host of SKA-Low, Murchison site
International Partners Europe, Asia, Americas — RSDCs & funding
CERN & Exascale HPC Partners Technical collaboration on exascale compute pipelines

Key Challenges

  • Data Volume: Raw radio data exceeds exascale; requires real-time reduction to manageable sizes.
  • Networking: Remote desert sites need Tbps-class fiber to global HPC centers.
  • Power: Remote facilities must secure renewable + grid integration for continuous uptime.
  • Distributed Collaboration: 15+ nations and multiple RSDCs coordinating processing and access.
  • Longevity: Maintaining petabyte archives for decades.

Strategic Importance

  • Cosmology: Map hydrogen distribution, study galaxy formation, probe dark energy.
  • Technology: Pushes the envelope on distributed exascale data management.
  • Global Infrastructure: One of the largest scientific collaborations ever, spanning continents.
  • Cross-Domain Learnings: Techniques for streaming, reduction, and distributed storage mirror challenges in AI data centers.

FAQ

  • How much data does SKA generate? ~600 PB/year raw data, reduced in real time before archival.
  • Where is the data processed? At on-site correlators, then at Regional Science Data Centres worldwide.
  • What’s unique about SKA? It spans two continents and produces radio astronomy data at an unprecedented scale.
  • How is this like AI data centers? Both rely on exascale HPC, storage, and networking to manage petabyte-scale data streams.
  • What’s next? SKA2 expansion in the 2030s, scaling to even larger data flows.