Exciting news! We're transitioning to the Statewide California Earthquake Center. Our new website is under construction, but we'll continue using this website for SCEC business in the meantime. We're also archiving the Southern Center site to preserve its rich history. A new and improved platform is coming soon!

Tuning AWP-ODC-OS for efficient, scalable performance on manycore architectures

David Lenz, Josh Tobin, Alexander N. Breuer, Alexander Heinecke, Charles Yount, & Yifeng Cui

Published August 15, 2017, SCEC Contribution #7813, 2017 SCEC Annual Meeting Poster #279

AWP-ODC-OS is open-source software which simulates seismic wave propagation after a fault rupture by using a staggered-grid finite difference method. Widely in use within the SCEC community, AWP-ODC-OS is now highly tuned for two important architectures, the Intel Xeon Phi manycore processor and the NVIDIA Tesla GPU.

We demonstrate that AWP-ODC-OS runs efficiently on Intel Xeon Phi clusters during full-scale runs, with performance comparable to that of top-of-the-line GPU clusters. Our improvements for the second generation of Xeon Phi processors, codenamed Knight’s Landing (KNL), span the entire optimization spectrum. We have increased vector parallelism at the register level, leveraged KNL’s new high-bandwidth memory, and added a custom task scheduler which ensures an overlap of computation and communication.

When comparing the performance of AWP-ODC-OS on a single Intel KNL 7910 node to a single NVIDIA K20X node and a single NVIDIA P100 node, we found that the KNL node achieved 2.85 times the performance of the K20X and 98% of the P100. The performance of AWP-ODC-OS on KNL nodes scales well: in a weak scaling study on Cori Phase II, we observed a parallel efficiency above 90% when scaling from 1 to 9000 nodes.

To ensure continued robust development of AWP-ODC-OS, we have implemented continuous delivery pipelines with GoCD which help maintain correctness of the software whenever a change is made to the code. Static code analysis, build checks, memory checks, and undefined behavior checks are run automatically after every committed change. Our framework can also be extended for more sophisticated testing after each major commit. We conclude our presentation with a discussion of current developments which will allow for automated benchmarking on XSEDE computing resources.

Key Words
3D Wave Propagation, GPU, Xeon Phi, KNL

Citation
Lenz, D., Tobin, J., Breuer, A. N., Heinecke, A., Yount, C., & Cui, Y. (2017, 08). Tuning AWP-ODC-OS for efficient, scalable performance on manycore architectures. Poster Presentation at 2017 SCEC Annual Meeting.


Related Projects & Working Groups
Computational Science (CS)