Exciting news! We're transitioning to the Statewide California Earthquake Center. Our new website is under construction, but we'll continue using this website for SCEC business in the meantime. We're also archiving the Southern Center site to preserve its rich history. A new and improved platform is coming soon!

An application of machine learning techniques to the evaluation of goodness-of-fit scores used in earthquake ground motion validation

Naeem Khoshnevis, & Ricardo Taborda

Published August 14, 2017, SCEC Contribution #7579, 2017 SCEC Annual Meeting Poster #237

We present an alternative approach to defining a goodness-of-fit (GOF) scoring system that uses machine learning techniques to evaluate the metrics commonly employed to validate synthetic seismograms from earthquake ground motion simulations with respect to data. Over the last decade, different metrics have been defined to estimate GOF scores, commonly characterized by numerical, normalized scales (e.g., 0 to 10) to quantify the level of similarity between any two given signals in both time and frequency. These metrics are, however, often combined arbitrarily, or used selectively based on intended applications or personal preferences. This has made it difficult to reach a well-informed consensus about their appropriate selection and use. We propose to use data-mining, machine learning techniques to better inform such decision making about ground motion validation. To that end, we rely on a dataset of existing validation results from previous physics-based (deterministic) earthquake ground motion simulations done for the greater Los Angeles region. Our dataset involves comparisons over 300 stations, using 11 different metrics, and 3 simulation sets for different velocity models. In machine learning lingo, as taken, our data are considered unlabeled. We use semi-supervised learning techniques to label the data, and then we use supervised learning to develop a validation prediction model. In particular, we use a constrained k-means clustering method, in which we define 4 hypothetical stations with scores 3, 5, 7, and 9 for all metrics. We put these stations in the category of cannot-link constraints, and conduct a large set of semi-supervised subspace analysis in 2, 3, and 4 dimensions to label the current dataset. Once the dataset has been labeled, we develop a decision tree using the C5.0 algorithm and introduce a simple, yet effective model to accurately classify a simulation in four groups (poor, fair, good, and excellent) based on a reduced number of metrics.

Key Words
Constrained k-means clustering, Decision tree, Ground motion simulation, Validation

Khoshnevis, N., & Taborda, R. (2017, 08). An application of machine learning techniques to the evaluation of goodness-of-fit scores used in earthquake ground motion validation. Poster Presentation at 2017 SCEC Annual Meeting.

Related Projects & Working Groups
Earthquake Engineering Implementation Interface (EEII)