GPU Accleration of Hercules

Patrick Small, Ricardo Taborda, Jacobo Bielak, & Thomas H. Jordan

Published 2014, SCEC Contribution #6048

This work describes one of the recent SCEC efforts to advance the use of general-purpose graphics processing units (GPGPU) in physics-based earthquake simulation software. In particular, we report on performance improvements achieved using GPGPU-oriented coding on Hercules, a SCEC-supported 3D earthquake ground motion simulator originally developed by the Quake Group at Carnegie Mellon University and currently part of the SEISM Project’s High-F Simulation Platform. Hercules is a parallel simulation code written in the standard C programing language. It uses the Message Passing Interface (MPI) libraries to manage inter-processor communications and an octree-based backbone to manage unstructured finite-element meshes. In its original form, Hercules has been thoroughly tested in multiple high-performance computing systems including Kraken (NICS, now decommissioned) and Blue Waters (NCSA), where it has shown near-excellent scalability. Hercules has also been used in multiple verification and validation simulations led by SCEC, including the TeraShake and ShakeOut scenarios, and other historical events such as the 1994 Northridge and the 2008 Chino Hills earthquakes. Utilizing CUDA, a parallel computing platform and programming model designed to work with NVIDIA graphics processing units, several computationally intensive physics calculations within Hercules have been moved to the GPGPU, resulting in greatly improved runtime performance. This implementation has been tested on multiple systems with hybrid CPU-GPU architectures, including USC’s HPCC system, NCSA’s Blue Waters, and OLCF’s Titan, one of the fastest supercomputers in the world, housed at Oak Ridge National Laboratory. The latest tests executed on Titan, in particular, correspond to simulations of the 2008 Chino Hills and the 2014 La Habra earthquakes. Our results for Chino Hills are consistent with previous simulations done using the CPU-only version of Hercules. Results for La Habra were used for a recent verification and validation effort still underway. Computationally speaking, the implementation of the GPU modules on Hercules shows performance improvements of the order of 2.5x on the overall solver execution time, and of higher factors on individual computing modules. These improvements are important because they will allow us to tackle larger and more detailed problems in the future.

Small, P., Taborda, R., Bielak, J., & Jordan, T. H. (2014). GPU Accleration of Hercules. Poster Presentation at 2014 SCEC Annual Meeting.

Related Projects & Working Groups
Computational science