Marine Denolle
Marine Denolle

Answering the biggest questions about the Earth’s seismic activity requires two branches of seismology to connect their data in a new way: the observational community’s use of cloud computing for big data analytics and processing of measurements from earthquakes; and the modeling community’s use of High-Performance Computing (HPC) to predict, or model, the seismic waves produced by strong shaking and its impact. The massive amounts of observational data collected from an earthquake’s seismic waves can reveal vital information about the Earth to build more powerful models and help answer questions that have previously been unanswerable, questions ranging from how the Earth’s structure evolved over millions of years to how earthquakes occur.

A new four-year, $3.2 million project funded by the National Science Foundation aims to bridge that gap, uniting large-scale seismic data processing and modeling in the cloud with the supercomputing power found in a HPC, and making it available for public use. The Seismic Computational Platform for Empowering Discovery, or SCOPED, project involves five universities: University of Washington, University of Alaska-Fairbanks, Columbia University, University of Texas and the Colorado School of Mines. Marine Denolle, assistant professor in Earth and Space Sciences, will be the “big data person” for the SCOPED project as the pipeline designer for the cloud. 

“We are still at the beginning of cloud expansion so we had to rethink the architecture of what this would look like,” said Denolle. “I’ve been trying to figure this out for the past few years. The UW-Seattle campus hosts the largest seismic archive IRIS-DMC that streams data in from global seismic stations before delivering it back to the world. The NSF wants to merge this data center with the other geodetic NSF facility UNAVCO to combine geophysical measurements of earthquakes on a cloud-based infrastructure.” SCOPED will create the bridge between the cloud and HPC for seismology.

Components of both the cloud and HPC excel in their own right, but linking the two would result in data that is much more robust and readily available. The cloud holds and quickly distributes vast amounts of data, while HPCs have the power to compute complex data much more quickly than personal (or general-purpose) pcs. Merging the two results in a useful feedback channel that pulls information down from the cloud to the HPC data center for processing, then sending it back to the cloud for the community’s use. The resulting platform from the SCOPED project hopes to be user-friendly and accessible, creating software, code and datasets for research and to help train the next generation of seismologists. The project also aims to provide the tools necessary to distill complex computation to a basic level of coding for entry-level scientists to use, a skill that currently only sits with highly-trained seismologists.

An example of the benefits of hosting data in the cloud is the Pacific Northwest Seismic Network website. The cloud structure makes sense in this situation because people only check the website when there’s an earthquake, so it needs to be able to quickly deliver a small amount of information to lots of people simultaneously. This lends itself well to a cloud platform because usually the data is small enough to not necessitate the use of a HPC. “The goal of this project is to create something like this, but on a larger scale. We need lots of cheap, small computers that can be run independently like the cloud structure,” says Denolle.

The project team will start with a group of 30 scientists, mostly early-career computer scientists and geophysicists from UW and the four other institutions who already possess the knowledge to decipher the data. The project will then organize training workshops, in the form of UW eScience-led Hackweeks, to train hundreds of students. Eventually, SCOPED aims to be a pilot experiment for other scientific fields that work with data integration and modeling.

“I’m really excited to start work on this feedback channel from the cloud, to the HPC then back onto the cloud,” said Denolle. “If we are able and want to scale this up, it will become the standard for seismology. We’re piloting this program to see if the hurdles can even be optimized because nothing like this exists right now.”