Predictive Models and Calibration Analysis in Large-Scale Computational Studies
Computational modeling and simulation are used to study many complex phenomena where physical
experiments are not feasible or too expensive. Examples include climate models, nuclear stockpile analysis, design and manufacturing of complex systems, and biological systems. Statistical methods play a crucial role in this area, ranging from the design of computer experiments and analysis of the outputs to developing statistical emulators, calibration analysis and, more generally, uncertainty quantification. This dissertation deals with two aspects of these statistical problems. The first part is concerned with statistical emulators. In most applications of interest, a statistical model is fit to the output from limited number of evaluations of the computational model, and the resulting “emulator" is used to approximate the input-output relationship. The method of choice is a Gaussian Spatial Process (GaSP), where the output is viewed as the realization of a Gaussian process. While GaSP can be implemented using frequentist methods, it is most commonly used within a Bayesian framework. We compare the performance of GaSP with flexible regression-based approaches. These include existing methods such as multivariate adaptive regression splines (MARS), smoothing-spline anova (SS-ANOVA), multiple adaptive regression tree model (MART), and two methods developed in this dissertation: expanded multivariate adaptive regression splines model (EMARS) and smoothing spline model with a kernel function based on exponential products (SS-Prod). Our empirical comparisons show that EMARS has better predictive performance than GaSP in a variety of situations. The EMARS can be implemented with the current MARS algorithm. Given its computational advantage, it can be applied to computational models with a larger number of input parameters. The second part of thesis focuses on the calibration problem, where we have to determine the true (but unknown) values of certain input parameters to the computational
model. This is a challenging inverse problem that suffers from identifiability issues. We develop conditions for determining identifiability and examine data-based approaches for checking the conditions in practice. The behavior of the methods is examined in various situations.