Exciting news! We're transitioning to the Statewide California Earthquake Center. Our new website is under construction, but we'll continue using this website for SCEC business in the meantime. We're also archiving the Southern Center site to preserve its rich history. A new and improved platform is coming soon!

Manage I/O Task in a Normalized Cross-Correlation Earthquake Detection Code for Large Seismic Datasets

Dawei Mu, Pietro Cicotti, & Yifeng Cui

Published August 15, 2017, SCEC Contribution #7711, 2017 SCEC Annual Meeting Poster #276

We have developed a high-performance GPU-based software called “cuda Normalized Cross-Correlation” (cuNCC), for calculating seismic waveform similarity for subjects like hypocenter estimates and small earthquake detection. We present the performance and I/O optimizations applied in the cuNCC code.

Our GPU-based template matching algorithm is designed to make full use of fast on-board/on-chip cache of modern GPU architecture, which includes register, constant memory, and shared memory etc. An application involving many templates, our algorithm achieves high efficiency due to introducing a new data-reuse feature in the algorithmic design. cuNCC records 2912 Gflop/s on a single Pascal P100 GPU, a speedup of more than 1,600x in comparison to a common sequential CPU code.

I/O efficiency became a significant bottleneck of the cuNCC’s overall performance. The I/O benchmarking results demonstrated that using the shared memory virtual filesystem as a buffer to output the cuNCC result obtained the best I/O efficiency, especially when the similarity coefficients are the median result for the following computation. When the shared memory virtual filesystem is unavailable, we recommend using CPU memory as a buffer to reduce disk access frequency for low bandwidth I/O device. As for high bandwidth I/O device, we suggest directly output results to storage without the buffering scheme.

We performed a realistic production run to evaluate the cuNCC code, using a total number of 21,325 template waveforms with 256 samples each. The seismogram dataset consists of all continuous recordings from the 43 stations within 4 weeks. The entire TMA process involves over 4 trillion NCC calculations. Our GPU-based cuNCC took 26 minutes on the Pascal P100, an optimized parallel CPU code in comparison would take 21 hours on 18-cores Xeon E7-8867. As a science application case, the number of aftershocks detected using the new TMA code is more than 4 times the number of aftershocks cataloged by the Central Weather Bureau in Taiwan.

Key Words
earthquake detection, I/O, CUDA

Citation
Mu, D., Cicotti, P., & Cui, Y. (2017, 08). Manage I/O Task in a Normalized Cross-Correlation Earthquake Detection Code for Large Seismic Datasets. Poster Presentation at 2017 SCEC Annual Meeting.


Related Projects & Working Groups
Computational Science (CS)