SCEC Award Number 14232 View PDF
Proposal Category Individual Proposal (Integration and Theory)
Proposal Title Efficient Similarity Search for Continuous Waveform Data
Investigator(s)
Name Organization
Gregory Beroza Stanford University
Other Participants
SCEC Priorities 1c, 2a, 2b SCEC Groups CME, CS, Seismology
Report Due Date 03/15/2015 Date Report Submitted N/A
Project Abstract
Template matching and related methods (Barrett and Beroza, 2014) require prior knowledge of the source signature. Search for signals with unknown signatures based on pair-wise matches (Brown et al., 2008) or a multiplicity of matches (Aguiar and Beroza, 2014) is possible; however, these approaches are naïve in that they directly compare all possible times with all others, and suffer from quadratic scaling with time. For problems of interest - decades of data recorded on hundreds of channels - they would overwhelm the most capable computers. We are developing techniques from data mining to implement scalable search for similar seismic signals. Our method converts waveforms into compact, diagnostic fingerprints. We then apply locality-sensitive hashing to the fingerprints to associate similar waveforms. This enables hierarchical search where we focus only on similar signals after they are sorted. Our method of Fingerprinting And Similarity Thresholding (FAST) exhibited similar results (detecting 21 of 24 events) to the autocorrelation method for waveform similarity (Table 1), but ran 160x faster on a test data set of one week of continuous waveform data from an earthquake sequence on the Calaveras Fault, and found previously uncataloged events. FAST will reach its full potential when applied to much larger data sets that would be impossible to analyze using other methods.
Intellectual Merit The potential impact of this project is profound. Data-intensive computing approaches have not yet had much impact in seismology. Addair et. al. (2014) developed an efficient processing pipeline for waveform cross-correlation; however their approach relies on distributed computing and parallelization, rather than fast algorithms and data mining. Moreover, they limit their analysis to previously known sources. Zhang et al. (2013) used similarity search for micro-earthquake analysis, but they use it to compare data with simulations, i.e., to improve the performance of grid-search. Our application is first of its kind that we know of for seismic data. Because seismic monitoring is foundational to seismology at the local, regional, and global scales, our work has the potential for global impact in earthquake monitoring. Cheap, capable sensor technology is poised to increase data rates dramatically, and earthquake seismology needs to prepare for this.
Broader Impacts This project involves three graduate students. One is a traditional geophysics graduate student. A second student comes to geophysics through a computational geoscience program. A third student is in the Stanford Institute for Computational Mathematics and Engineering. Introducing problems of seismology to students with a broad array of backgrounds should help increase interest in seismology.
Exemplary Figure Figure 1. Overview of feature extraction steps in FAST. (A) Continuous time series data. (B) Spectrogram: magnitude on log scale. (C) Spectral images from two similar earthquakes, at times 1267 s and 1629 s. (D) Haar wavelet coefficients: magnitude on log scale. (E) Sign of top z-score deviation Haar coefficients, after data compression. (F) Binary fingerprint: output of feature extraction. Notice that similar spectral images result in similar fingerprints.