Home  /  SCEC Workshops  /  SCEC CSEP Workshop: Final Evaluation of the Regional Earthquake Likelihood Models (RELM) Experiment and the Future of Earthquake Forecasting

SCEC CSEP Workshop: Final Evaluation of the Regional Earthquake Likelihood Models (RELM) Experiment and the Future of Earthquake Forecasting

Conveners: Ned Field, Tom Jordan, Andy Michael, Danijel Schorlemmer, Max Werner, and Jeremy Zechar
Dates: June 6-7, 2012
Location: Rancho Las Palmas Resort, Rancho Mirage, CA (Salons B/C)
SCEC Award and Report: 12169

CSEP-RELM Workshop Participants

PARTICIPANTS: Peter Bird (UCLA), Mike Blanpied (USGS), David Eberhard (ETHZ), Ned Field (USGS), Matt Gerstenberger (GNS Science), Jeanne Hardebeck (USGS), Egill Hauksson (Caltech), Naoshi Hirata (ERI/U Tokyo), James Holliday (UC Davis), Tran Huynh (SCEC/USC), Dave Jackson (UCLA), Lucy Jones (USGS), Tom Jordan (SCEC/USC), Yan Kagan (UCLA), Masha Liukis (SCEC/USC), Andrea Llenos (USGS), Phil Maechling (SCEC/USC), Andy Michael (USGS), Kazu Nanjo (NIED), Yosi Ogata (ISM/U Tokyo), Morgan Page (USGS), Tom Parsons (USGS), Peter Powers (USGS), John Rundle (UC Davis), Bruce Shaw (LDEO/Columbia), Terry Tullis (Brown), Don Turcotte (UC Davis), Steve Ward (UCSC), Max Werner (Princeton), Jeremy Zechar (ETHZ), Jiancang Zhuang (ISM/U Tokyo)

SUMMARY: The topics of this two-­‐day workshop held June 6-­‐7 in Rancho Mirage, California, were (i) a thorough evaluation of the five-­‐year Regional Earthquake Likelihood Models (RELM) initiative, (ii) lessons for the design of future forecast evaluations, and (iii) operational earthquake forecasting and validation. The RELM experiment, conducted within the Collaboratory for the Study of Earthquake Predictability (CSEP) at SCEC, helped clarify many ideas about how to conduct earthquake forecast evaluations; it sparked further research into model development and evaluation; and served as a blueprint for numerous similar experiments within CSEP around the globe. This workshop facilitated the dissemination of the RELM results and provided a forum for the discussion and future planning of earthquake predictability studies. Several important avenues for future research were identified, including finite-­‐size rupture and improved short-­‐ term forecasting experiments. These are important because of their relevance for hazard purposes (e.g., for the Uniform California Earthquake Rupture Forecast and Operational Earthquake Forecasting by official agencies) and as probes into the physics of earthquake triggering (e.g., Coulomb stress transfer).

Presentation slides may be downloaded by clicking the pdf links following the title. PLEASE NOTE: Slides are the author’s property. They may contain unpublished or preliminary information and should only be used while viewing the talk.


19:00 Group Reception and Dinner Sunrise Terrace


07:30 Group Breakfast Salon A
08:30 Introduction and Meeting Objectives M. Werner / T. Jordan
  Session 1: First-Order Results from the RELM Experiment J. Hardebeck, moderator
08:40 The RELM Experiment: Purpose and Overview N. Field
08:50 Review of Experiment Design and Results from Likelihood Tests J. Zechar
09:20 An Evaluation of RELM Test Results D. Turcotte
09:40 Group Discussion
  • What can we conclude from these results about the models?
  • How and why do the two views differ?
  • Are the tests adequate?
  • How can we ensure broader participation?
10:00 Break  
  Session 2: Further Results from the RELM Experiment Bruce Shaw, moderator
10:15 Accounting for Catalog Uncertainties: Western Pacific and RELM D. Eberhard
10:30 How Much Information is There in Any Five-Year Forecast? M. Gerstenberger
10:45 Earthquake Occurrence Hypotheses and the RELM Results M. Werner
11:00 Bayesian Approach to Evaluating Forecasts and Constructing Ensemble Forecasts and Also a Peer-to-Peer Gambling Score for Evaluating Forecasts J. Zechar for W. Marzocchi
11:15 Group Discussion
  • How representative are the RELM results?
  • How important are data uncertainties?
  • Which performance metrics reveal which forecast characteristics?
  • What can we conclude about the forecasts’ underlying hypotheses?
  • How do we construct reference models for other experiments?
  • How should model development be encouraged?
12:00 Group Lunch Salon A
  Session 3: Reports on Regional CSEP Activities P. Maechling, moderator
13:00 California T. Jordan
13:15 Current Status and Future Plans of the CSEP-New Zealand Testing Center M. Gerstenberger
13:30 CSEP Activity in Japan: Prospective Earthquake Forecast Experiments N. Hirata
13:45 CSEP-China and Europe D. Eberhard for A. Mignan
14:00 Global Earthquake Forecasting Based on Smoothed Seismicity and Tectonic Strain D. Jackson
14:15 CSEP Software Development: Status, Priorities, Practices M. Liukis
14:30 Group Discussion
  • How should the regional CSEP nodes collaborate?
  • How can we attract funding?
  • How should global experiments be conducted?
14:45 Break  
  Session 4: Beyond RELM - The Future of Forecasting J. Rundle, moderator
15:00 Some Residual Analysis Methods for Space-Time Point Processes R. Schoenberg
15:15 CSEP Results from Time-Dependent Earthquake Forecasts for the M9 Tohoku Sequence K. Nanjo
15:30 Some Issues and Proposals for Operational Space-Time Forecasting and Their Evaluations Y. Ogata
15:45 Group Discussion
  • Fixed-interval versus event-based forecasting and testing
16:15 Stochastic Approach to Faults, Earthquakes, and Forecasting in California D. Jackson for S. Hiemer
16:30 Statistical Seismology: Rogue earthquakes that are not ROGUE Y. Kagan
16:35 Group Discussion
  • How can finite-size ruptures forecastss be cast and tested?
17:00 Adjourn  
19:00 Group Dinner Sunrise Terrace


07:30 Group Breakfast Salon A
  Session 5: Beyond RELM - The Future of Forecasting (continued) S. Ward, moderator
08:30 What About Coulomb? A. Michael for T. Parsons
08:45 Using Earthquake Simulators for Earthquake Forecasting and Their Testing in CSEP T. Tullis
09:00 Challenges/Opportunities to Improve CSEP Scoring and Classes P. Bird
09:15 Scoring Annual Earthquake Predictions in China J. Zhuang
09:30 Group Discussion
  • External registration and testing of forecasts and predictions in CSEP
  • How can simulators be validated?
  • How should future forecast evaluations be designed?
10:00 Break  
  Session 6: Overview, Purpose, and Scope of Operational Earthquake Forecasting (OEF) M. Page, moderator
10:15 CSEP Plans and OEF Requirements T. Jordan
10:30 Making Earthquake Forecasting in California Actually Operational [Jones, Field] L. Jones and N. Field
10:45 Time-Dependent Modeling and OEF for the Canterbury Earthquake Sequence M. Gerstenberger
11:00 Earthquake Statistics and Probalistic Forecasting for the Southern Kanto After the 2011 Mw9.0 Tohoku-Oki Earthquake N. Hirata
11:15 Group Discussion
  • What is CSEP's role in OEF?
  • How important are data uncertainties?
  • Is forecating before validation valid?
  • After clustering, what is next for OEF?
12:00 Group Lunch Salon A
  Session 7: Next Steps for OEF - Sooner Rather Than Later P. Powers, moderator
13:00 STEP versus ETAS and OEF Testing Strategies J. Zechar
13:15 Strategies for Retrospective Testing Including STEP-Variants A. Michael
13:30 SCSN/CISN Real-Time and Post-Processing Magnitudes E. Hauksson
13:45 Group Discussion
  • How can we reduce catalog latency in OEF?
  • How will ComCat affect CSEP operations?
  • Do we want to test OEF in real-time or with the final catalog?
  • Can retrospective tests matter?
14:15 Break  
  Session 8: When Models and Data Keep Changing M. Blanpied, moderator
14:30 The Value of CSEP for the National Earthquake Prediction Evaluation Council (NEPEC) T. Tullis
14:45 R-Test D. Jackson
15:00 What Can and Can't CSEP do for the WGCEP and NSHMP? N. Field
15:15 Group Discussion
  • What have we learned?
  • What should we change?
  • Who picks the tests?
  • The lifetime of models versus the lifetime of tests
  • The promise and limits of global testing
15:30 Session 9: Wrap-Up and Recommendations  
16:30 Adjourn  


  1. Interpretation of the RELM Results
    • RELM results were disseminated and discussed among the modelers, testers and the wider CSEP community. Seismicity‐based models appeared to perform best (with Helmstetter’s model leading the group), with GPS‐based models not lagging far behind, while fault‐based models appeared to poorly anticipate the spatial distribution of seismicity. A physics‐based simulator did not perform well.
    • Accounting for data (catalog) uncertainties was agreed not to be straightforward, but in the case of RELM it may not be critical. Eberhard described a method for creating perturbed catalogs by adding noise to the observed catalog and recalculated the results. Perturbed catalogs tended to “smear out”, but mostly preserve, the original results. Bird remarked that perturbed catalogs effectively further smear what is already a blurred representation of the truth (and thus introduces further bias). Zechar and Eberhard countered that using perturbed catalogs provides insight into robustness of a model's predictive performance.
    • Different tests and performance metrics probe different aspects of the forecasts. Participants agreed that there exists no single best test, so that metrics and their merits need to be clearly understood. J. Zechar and R. Schoenberg emphasized that the CSEP consistency tests (i.e., those that compare a single forecast with the observations) cannot be used to rank forecasts.
    • Understanding the robustness of the results with respect to the duration and timing of the 5‐year target period appears critical and yet difficult. Retrospective tests from 1981 (by Werner) tended to support the prospective results. However, synthetic catalogs bootstrapped from observations since 1932 (Gerstenberger) showed substantial variations in the performance of the Helmstetter model.
    • The RELM experiment design included assumptions that were greatly simplifying, but, to some extent, problematic. The assumed independence of space‐magnitude bins and the Poisson distribution were discussed. Werner showed that more reliable negative‐binomial number distributions for each forecast led to fewer N‐ test rejections, but also broader ranges of acceptance.
    • Field remarked that for the purposes of seismic hazard, rupture forecasts are required, rather than epicenter forecasts.
    • Participants noted that several of the RELM models continue to be improved, partly as a result of RELM.
  2. CSEP Status
    • Participants reported on the status of regional CSEP activities in California, Japan, New Zealand, China and Europe. The current CSEP software distribution via CSEP @ SCEC was greatly appreciated.
    • The ongoing global CSEP experiment was discussed as a potential solution to the relatively small number of large earthquakes in the RELM experiment.
  3. Future of Earthquake Forecasting
    • Schoenberg and Ogata described methods for point‐process forecasts that are continuous in space and time and do not require discrete forecast intervals or spatial cells, solving the independence and Poisson assumptions for the class of qualified models. Likelihood evaluation and residual analysis were emphasized.
    • Jackson presented stochastic, finite‐size rupture forecasts. Hypocenters are based on a combination of smoothed seismicity and tectonic strain maps, and ruptures were forecast using a spatially smoothed focal mechanism density and magnitude‐length scaling relationships.
    • Results by Segou, Parsons, and Ellsworth suggested that hybrid Coulomb/ETAS forecasts with secondary triggering outperform standard Coulomb forecasts.
  4. OEF
    • Jordan described the tasks of the joint USGS/SCEC working group on OEF. The tasks consist of: Reducing CSEP testing latency from 1 day to 1 hour or less; Establishing long‐term and short‐term reference models (e.g., NSHMP and STEP); reducing catalog latency and developing testing procedures that address immature nature of real‐time catalog data; develop testable UCERF versions and components; expand CSEP activities to include retrospective testing.
    • Jones presented a draft USGS strategy for OEF with several stages: 1) research to develop new methods, 2) model development to make the algorithms work in real‐time and make them testable, 3) validation of the methods by CSEP, 4) product design that incorporates social science to best communicate to users, 5) operation of the models and dissemination of products, and 6) evaluation of the products to see if they are properly communicating the forecasts and producing the desired actions on the part of end‐users.
    • Field discussed a strategy for creating short‐term ETAS‐inspired forecasts that sample ruptures from the long‐term UCERF model. This has the advantage that the short‐term forecasts only produces large events that have been deemed possible for the long‐term model. This implementation requires elastic‐rebound corrections to remove sources that have occurred from the model in real‐time. Otherwise, in areas that have strongly characteristic magnitude‐frequency distributions, non‐physical runaway sequences can occur in the results.
    • Hauksson described real‐time data streaming and processing at the CISN, including different magnitude scales and their availability as a function of time from the NCSN and the SCSN. There was a discussion of how real‐time data affects forecasts and their tests. Ideally, the tests should be done using the best catalog. However, incompatibilities between the real‐time data and the final catalog could introduce errors into the forecast models that affect the tests. But the forecasts must be done using the real‐time data in order to be societally useful. Temporal and spatial variations in catalog completeness, especially after large earthquakes, needs to be estimated and taken into account for both producing the forecasts and for testing them.
    • Michael discussed strategies for retrospective testing of short‐term models, including STEP‐variants. One strategy is to test variants of complex models, such as STEP, by turning individual components off to determine which features are critical to producing good results. These features can then be incorporated into other models. Another strategy is to retrospectively test models that have tunable parameters by dividing the data into training and test sets. Because data quality can vary with time and large isolated sequences can have strong effects on the tests, instead of dividing the data into the two sets at a single point in time, the training and test sets could be produced by separating alternating time periods such as years, months, or even individual events.


  1. RELM
    • The RELM experiment was recognized as a successful first prospective and comparative earthquake forecasting experiment that clarified ideas about how forecasts might be cast and evaluated. RELM has led to both intensified research into how to evaluate forecasts and model development.
    • Although the presence of models based on a variety of input data (GPS, seismicity, faults, stress‐directions) was noted, the relative absence of physics‐based models was lamented.
    • Participants recommended continuing research into the robustness of the results with respect to target period and duration.
    • As UCERF 2 was not tested in RELM, participants suggested retrospectively testing UCERF 2 against the RELM observations. This test would still be meaningful as a prospective test, as UCERF 2 was “essentially” completed by the start of the RELM experiment.
  2. CSEP
    • It was recommended to continue studying and improving the design of forecast evaluations over RELM for different types of experiments.
    • Round‐robin evaluations, such as those presented by Zechar, would provide useful information.
    • Finite‐size rupture forecasts and their evaluation would be particularly useful for comparison with UCERF and seismic hazard maps.
    • Hybrid Coulomb/ETAS models were recommended to be studied further.
    • To obtain a larger data set with more large earthquakes, it was recommended to continue global forecasting experiments.
    • To probe qualified (continuous) point‐process models such as STEP and ETAS, it was suggested to design event‐based forecast experiments for such models.
    • Participants also suggested building capabilities for model development within CSEP, such as model averaging (as presented by Zechar) or mixing.
  3. OEF
    • CSEP was identified as playing a vital role in the validation of operational forecasts. A useful role includes (i) providing existing (and accumulating) results of relevant models, and (ii) registering externally generated operational forecasts by official agencies and validating them against data and reference models.
    • Tullis, as chair of NEPEC, recommended the validation of OEF within CSEP as a great potential benefit to NEPEC in the evaluation of claims of earthquake predictions and increased hazards.
    • Gerstenberger, based on their experience during the 2010‐12 Canterbury, New Zealand, sequence, recommended planning and practicing the communication of time‐varying hazard and probabilities well in advance of earthquakes. He further emphasized that 24hr forecasts were of relatively little use compared to forecasts for weeks, months and years.


  1. RELM
    • Complete the studies on the robustness of results with respect to target period and duration.
    • Retrospectively test the long‐term and short‐term UCERF3 models.
  2. CSEP
    • Solicit physics‐based models.
    • mplement within CSEP methodologies for model building, such as modelaveraging and mixing.
    • Solicit a greater number of models competing in global experiments.
    • Prototype and implement time‐dependent event‐based experiments.
    • Develop strategies for finite‐sized rupture forecasts and validation.
    • Strengthen international collaboration through further workshops.
  3. OEF and CSEP
    • Retrospectively evaluate clustering models such as ETAS and STEP, using standard CSEP tests and the strategies proposed by Michael.
    • Implement a short‐term clustering component of UCERF3.
    • mplement CSEP capabilities for the registration and validation of external forecasts and predictions.
    • Reduce CSEP forecast intervals from 24 hours to 1 hour or less.
    • Study the quality of real‐time catalogs and assess impacts on OEF and tests.
    • Evaluate the new capabilities of the new USGS data distribution method, ComCat, for real‐time OEF.
    • Develop strategies for the communication of OEF with social scientists.