Exciting news! We're transitioning to the Statewide California Earthquake Center. Our new website is under construction, but we'll continue using this website for SCEC business in the meantime. We're also archiving the Southern Center site to preserve its rich history. A new and improved platform is coming soon!

It’s Not Always About “The Big One”: Why We Mine for the Little Ones, Too

Concept illustration of the "Mining Seismic Wavefields" (MWS) project / by Jason Ballmann

What’s a bit new, trendy, and even a little quirky in earthquake science? Mining through extensive earthquake data in clever, innovative ways – especially in search of the "little ones" we never feel!

Want to find recent earthquake information?

Use and bookmark the USGS Recent Earthquakes Map!

But why? From decades of recording seismic activity, scientists have amassed volumes of earthquake data (and other “noise”). A new approach to research is allowing experts to analyze what earthquakes were missed or misinterpreted. Often, tiny earthquakes can be overlooked.

Scientists are also hoping this research can help to: enhance our knowledge of how earthquakes work; find faults we didn’t know exist; improve seismic monitoring equipment and techniques; and maybe even one day inform the prediction of earthquakes. (While we are still nowhere close to predicting earthquakes, we are trying to at least forecast them.)

Of-course, we know about significant earthquakes because we often feel them. They stand out amid the dozens (if not hundreds) of typical, small, and unfelt earthquakes we encounter each day in California alone. The focus of this project, rather, is on those tiny ones, well-below magnitude 2.0. At best, these seemingly insignificant temblors cause as much as a disturbance to our society as your neighborhood garbage truck.

Figure 1. Map showing relocated seismicity for southern California and high-resolution relocations of the ongoing Cahuilla swarm, near Temecula, southern California. The Cahuilla swarm defines a ~6 km long fault dipping steeply to the east (Hauksson et al., 2018).

That garbage truck is causing more problems than just waking you up, however. Along with helicopter noise and even those pesky Diablo or Santa Ana winds, many manmade and natural intrusions make mining seismic data of the smaller variety much more difficult.

With the aid of artificial intelligence, scientists can develop programs for supercomputers to quickly and more accurately sift through live, continuous data and older catalogs to and answer these questions, with careful and meticulous supervision:

  • What earthquakes are going undetected because they’re so close to each other in time and space that they must just look like one earthquake, when in fact they are perhaps several?
  • What earthquakes are going undetected because they look like helicopter or garabe truck sounds, weather phenomena, and other random, unexpected noise?
  • What does this all mean for how understanding of earthquake processes and our seismic future?

These are the fundamental issues of the Mining Seismic Wavefields (MSW) project, a research collaboration between Stanford, USC, Caltech, and Georgia Tech (funded by NSF’s Geoinformatics program). The research team has developed and demonstrated new methods for seismological data mining. When different earthquakes occur near one another, their waveforms picked up on seismic sensors may bear the same signature of interaction with the complex crust of the Earth. This waveform similarity may also arise because seismic sources are recorded at instruments close to one another. The MSW project takes this into account in very interesting ways and hopes to inspire many other techniques too.

“Template Matching” Method

Waveform similarity due to nearby sources. Template matching is an earthquake detection approach that searches continuous seismic data for signals that match the wiggles of known earthquakes. 

The group at Caltech (co-PI Egill Hauksson and Postdoc Zach Ross) developed and applied a template matching algorithm (QTMatch). Early in the project, with the assistance of SCEC Associate Director for Information Technology Phil Maechling, they carried out template matching on NSF supercomputers. Subsequently they tailored QTMatch to run efficiently on local GPU-based supercomputers, and applied it to the entire continuous waveform archive of the Southern California Seismic Network using the seismograms of ~300,000 previously known earthquakes as templates. 

This ambitious effort detected ~2.4 million earthquakes for the period 2008-2017, which is ~13 times as many events as in the original catalog. They teamed with Peter Shearer at Scripps and Daniel Trugman at Los Alamos National Laboratory to locate these newly discovered earthquakes using 1.3 billion recordings of these earthquakes from many local seismometers. The unprecedented detail in this next-generation seismicity catalog is evident in the seismicity of the Cahuilla swarm (Figure 1), and will facilitate important new insights into earthquake activity in southern California.

Template matching is an example of “informed” search in which we know ahead of time what signals we are looking for. This is an effective approach for a well-studied region like Southern California. 

To extend the similarity search to less well-studied areas, the Stanford group (PI Greg Beroza and graduate students Clara Yoon*and Karianne Bergen§) developed a new search algorithm called FAST (Fingerprinting and Similarity Thresholding) that performs computationally efficient similarity search by adapting technologies originally developed for other purposes, such as audio clip identification. Applying FAST to the first 3 months of the 2010 Guy-Greenbrier, Arkansas sequence increased the number of cataloged earthquakes from 75 to over 14,000, which illuminated the relationship between microseismic activity and hydraulic stimulation (fracking) for unconventional hydrocarbon development, as shown in figure 2.

The FAST algorithm is now being used to detect small earthquakes in diverse settings around the world. 

Figure 2.Time evolution of seismicity of a cluster of earthquakes associated with hydraulic fracturing at 5 production wells, near the north end of the Guy-Greenbrier Fault. Colored circles show microearthquakes colored by time of occurrence. Rectangles in well laterals show injection stages also colored by time. The close correspondence in time and space of seismicity and injection could only be discerned through the detection and precise location of these abundant very small events.

“Denser Network” Method

Waveform similarity due to nearby sensors. The USC group (co-PI Yehuda Ben-Zion and graduate student Haorang Meng) used the similarity of adjacent recordings within a dense, temporary network near the San Jacinto Fault to detect small seismic events without using templates. In the course of their work and joint studies with Chris Johnson and Frank Vernon at Scripps it became clear that weak ground motion with comparable and larger amplitudes to small earthquakes is generated by various other sources including: airplanes, helicopters, and interaction of the wind with obstacles above the ground (figure 3). These non-seismic sources produce earthquake-like and tremor-like signals that occupy 25% or more of the data, depending on location. Careful study is required to distinguish these and other anthropogenic sources from genuine earthquakes and tremor.

Figure 3.  Dense array observations from an earthquake, an aircraft, an automobile, and wind. Top panels show ground motion with time, middle panels show ground motion amplitude with frequency, and bottom panels show amplitude with frequency (vertical scale) and time (horizontal scale).  Detailed analyses of dense recordings were required to distinguish among these diverse sources of seismic waves. (Meng at al, 2019)

The group at Georgia Tech (co-PI Zhigang Peng and graduate student Zefeng Li) developed a novel method for seismic event detection that can be applied to dense array measurements in which nearby stations record similar signals. The method uses local similarity, among closely spaced stations of a 5200-­station temporary deployment in Long Beach, California, to detect very weak events below noise levels with high confidence. Some signals have known sources, while others remain unknown (figure 4).

Figure 4. Detected examples of local events in high frequency seismic data. (a) Shows seismic signature of vibroseismic truck (a source designed for seismic imaging experiments) across the Long Beach array. (b) Seismic signature of a small nearby earthquake. (c) An unknown event, possibly associated with oil production near Long Beach. (d) An unknown event.

The examples above represent a small sample of the techniques developed and results obtained under the Mining Seismic Wavefields (MSW) project. An important part of the project is not only to develop these methods, but to make them broadly available.  

Towards that end, we are distributing computer programs through GitHub and these new seismicity catalogs through seismological data centers. Data-intensive computing approaches, such as these, have had limited impact in seismology, but that is changing as low-cost, capable sensor technology provides unparalleled spatial resolution of seismic wavefields. 

The work carried out under the Mining Seismic Wavefields (MSW) project will help realize the full potential of seismic networks of the future, and in doing so will provide a much clearer view into earthquake processes.

*  Now a Research Geophysicist at USGS, Pasadena

§  Now a postdoc at Harvard

‡  Now a postdoc at Caltech

NSF grant numbers are: EAR-1551462 and EAR-1818579.