Proposal Report | Southern California Earthquake Center

SCEC Project Details

SCEC Award Number

14232

View PDF

Proposal Category

Individual Proposal (Integration and Theory)

Proposal Title

Efficient Similarity Search for Continuous Waveform Data

Investigator(s)

Name	Organization
Gregory Beroza	Stanford University

Other Participants

SCEC Priorities

1c, 2a, 2b

SCEC Groups

CME, CS, Seismology

Report Due Date

03/15/2015

Date Report Submitted

N/A

Project Abstract

Template matching and related methods (Barrett and Beroza, 2014) require prior knowledge of the source signature. Search for signals with unknown signatures based on pair-wise matches (Brown et al., 2008) or a multiplicity of matches (Aguiar and Beroza, 2014) is possible; however, these approaches are naïve in that they directly compare all possible times with all others, and suffer from quadratic scaling with time. For problems of interest - decades of data recorded on hundreds of channels - they would overwhelm the most capable computers. We are developing techniques from data mining to implement scalable search for similar seismic signals. Our method converts waveforms into compact, diagnostic fingerprints. We then apply locality-sensitive hashing to the fingerprints to associate similar waveforms. This enables hierarchical search where we focus only on similar signals after they are sorted. Our method of Fingerprinting And Similarity Thresholding (FAST) exhibited similar results (detecting 21 of 24 events) to the autocorrelation method for waveform similarity (Table 1), but ran 160x faster on a test data set of one week of continuous waveform data from an earthquake sequence on the Calaveras Fault, and found previously uncataloged events. FAST will reach its full potential when applied to much larger data sets that would be impossible to analyze using other methods.

Intellectual Merit

The potential impact of this project is profound. Data-intensive computing approaches have not yet had much impact in seismology. Addair et. al. (2014) developed an efficient processing pipeline for waveform cross-correlation; however their approach relies on distributed computing and parallelization, rather than fast algorithms and data mining. Moreover, they limit their analysis to previously known sources. Zhang et al. (2013) used similarity search for micro-earthquake analysis, but they use it to compare data with simulations, i.e., to improve the performance of grid-search. Our application is first of its kind that we know of for seismic data. Because seismic monitoring is foundational to seismology at the local, regional, and global scales, our work has the potential for global impact in earthquake monitoring. Cheap, capable sensor technology is poised to increase data rates dramatically, and earthquake seismology needs to prepare for this.

Broader Impacts

This project involves three graduate students. One is a traditional geophysics graduate student. A second student comes to geophysics through a computational geoscience program. A third student is in the Stanford Institute for Computational Mathematics and Engineering. Introducing problems of seismology to students with a broad array of backgrounds should help increase interest in seismology.

Exemplary Figure

Figure 1. Overview of feature extraction steps in FAST. (A) Continuous time series data. (B) Spectrogram: magnitude on log scale. (C) Spectral images from two similar earthquakes, at times 1267 s and 1629 s. (D) Haar wavelet coefficients: magnitude on log scale. (E) Sign of top z-score deviation Haar coefficients, after data compression. (F) Binary fingerprint: output of feature extraction. Notice that similar spectral images result in similar fingerprints.

SCEC Project Details

For Researchers

For The Media