Prioritizing Ground‐Motion Validation Metrics Using Semisupervised and Supervised Learning

Naeem Khoshnevis, & Ricardo Taborda

Published June 26, 2018, SCEC Contribution #8015

It has become common practice to validate ground motion simulations based on a variety of time and frequency metrics scaled to quantify the level of agreement between synthetics and data or other reference solutions. There is, however, no agreement about the importance or weight that it ought to be given to each metric. This leads to their selection often being subjective, either based on intended applications or personal preferences. As a consequence, it is difficult for simulators to identify what modeling improvements are needed, which would be easier if they could focus on a reduced number of metrics. We present an analysis that looks into eleven ground motion validation metrics using semi-supervised and supervised machine learning techniques. These techniques help label and classify goodness-of-fit results with the objective of prioritizing and narrowing the choice of these metrics. In particular, we use a validation dataset of a series of physics-based ground motion simulations done for the 2008 Mw 5.4 Chino Hills, California, earthquake. We study the relationships that exist between eleven metrics, and carry out a process where these metrics are understood as part of a multi-dimensional space. We use a constrained k-means method, and conduct a subspace clustering analysis to address the implicit high-dimensional effects. This allows us to label the data in our dataset into four validation categories (poor, fair, good, excellent) following previous studies. We then develop a family of decision trees using the C5.0 algorithm, from which we select a few trees that help narrow the number of metrics leading to a validation prediction into the four referenced categories. These decision trees can be understood as rapid predictors of the quality of a simulation, or as data-informed classifiers that can help prioritize validation metrics. Our analysis, although limited to the particular dataset used here, indicates that among the eleven metrics considered, the acceleration response spectra and total energy of velocity are the most dominant ones, followed by the peak ground response in terms of acceleration and velocity.

Khoshnevis, N., & Taborda, R. (2018). Prioritizing Ground‐Motion Validation Metrics Using Semisupervised and Supervised Learning. Bulletin of the Seismological Society of America, 108(4), 2248-2264. doi: 10.1785/0120180056.