Exciting news! We're transitioning to the Statewide California Earthquake Center. Our new website is under construction, but we'll continue using this website for SCEC business in the meantime. We're also archiving the Southern Center site to preserve its rich history. A new and improved platform is coming soon!

Petaflop Seismic Simulations in the Public Cloud

Alexander N. Breuer, Yifeng Cui, & Alexander Heinecke

Accepted June 19, 2019, SCEC Contribution #9083

During the last decade cloud services and infrastructure as a service became a popular solution for diverse applications. Addition- ally, hardware support for virtualization closed performance gaps, compared to on-premises, bare-metal systems. This development is driven by offloaded hypervisors and full CPU virtualization. Today’s cloud service providers, such as Amazon or Google, offer the ability to assemble application-tailored clusters to maximize performance. However, from an interconnect point of view, one has to tackle a 4-5× slow-down in terms of bandwidth and 25× in terms of latency, compared to latest high-speed and low-latency interconnects. Taking into account the high per-node and accelerator-driven performance of latest supercomputers, we observe that the network bandwidth performance of recent cloud offerings is within 2× of large supercomputers. In order to address these challenges, we present a comprehensive application-centric approach for high-order seismic simulations utilizing the ADER discontinuous Galerkin finite el- ement method, which exhibits excellent communication characteristics. This covers the tuning of the operating system, normally not possible on supercomputers, micro-benchmarking, and finally, the efficient execu- tion of our solver in the public cloud. Due to this performance-oriented end-to-end workflow, we were able to achieve 1.09 PFLOPS on 768 AWS c5.18xlarge instances, offering 27,648 cores with 5 PFLOPS of theoretical computational power. This correlates to an achieved peak efficiency of over 20% and a close-to 90% parallel efficiency in a weak scaling setup. In terms of strong scalability, we were able to strong-scale a science sce- nario from 2 to 64 instances with 60% parallel efficiency. This work is, to the best of our knowledge, the first of its kind at such a large scale.

Breuer, A. N., Cui, Y., & Heinecke, A. (2019, 06). Petaflop Seismic Simulations in the Public Cloud. Oral Presentation at ISC High Performance 2019. http://dial3343.org/pub/papers/19_03_26_isc_19.pdf