Faster and easier: cross-validation and model robustness checks

Machine learning and statistical methods are increasingly used in high-stakes applications – for instance, in policing crime, making predictions about the atmosphere, or providing medical care. We want to assess the extent to which we can trust our methods, though, before we use them in such applica...

Full description

Bibliographic Details
Main Author: Stephenson, William T.
Other Authors: Broderick, Tamara
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/143247
Description
Summary:Machine learning and statistical methods are increasingly used in high-stakes applications – for instance, in policing crime, making predictions about the atmosphere, or providing medical care. We want to assess the extent to which we can trust our methods, though, before we use them in such applications. There exist assessment tools, such as cross-validation (CV) and robustness checks, that help us understand exactly how trustworthy our methods are. In both cases (CV and robustness checks), a typical workflow follows the pattern of “change the dataset or method, and then rerun the analysis.” However, this workflow (1) requires users to specify the set of relevant changes, and (2) requires a computer to repeatedly refit the model. For methods involving large and complex models, (1) is expensive in terms of user time, and (2) is expensive in terms of compute time. So CV, which requires (2), and robustness checks, which often require both (1) and (2), see little use in the large and complex models that need them the most. In this thesis, we address these challenges by developing model evaluation tools that are fast in terms of both compute and user time. We develop tools to approximate CV when it is most computationally expensive: in high dimensional and complex, structured models. But approximating CV implicitly relies on the quality of CV itself. We show theory and empirics calling into question the reliability of the use of CV for quickly and automatically tuning model hyperparameters – even in cases where the behavior of CV is thought to be relatively well-understood. On the front of robustness checks, we note that a common workflow in Bayesian prior robustness requires users to manually specify a set of alternative reasonable priors, a task that can be time consuming and difficult. We develop automatic tools to search for a prediction-changing alternative prior for Gaussian processes, saving users from having to manually specify the set of alternative priors.