Theory-constrained Data-driven Model Selection, Specification, and Estimation: Applications in Discrete Choice Models
This thesis provides a framework, along with demonstrated applications, for carefully bringing data-driven flexibility to the specification and model selection of discrete choice models; while, at the same time, maintaining usability for analysis. Assumptions brought to bear under the classical theo...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/143299 https://orcid.org/ 0000-0002-0829-1696 |
Summary: | This thesis provides a framework, along with demonstrated applications, for carefully bringing data-driven flexibility to the specification and model selection of discrete choice models; while, at the same time, maintaining usability for analysis. Assumptions brought to bear under the classical theory-based paradigm enjoy varying degrees of credibility. Some are rooted in economic theory (e.g., utility maximizing behavior) or in information available to the scientist on the data generating process (e.g., exogeneity). These assumptions can be argued to be highly credible. Others are driven by convenience, convention, pursuit of smaller standard errors, or an otherwise lack of systematic specification and model selection process (e.g., restrictive functional and distributional forms, and trial-and-error specification testing). These assumptions are arguably less credible.
Our goal is to overcome some of the arbitrary specification and model selection practices that undermine credibility. To this end, theory-constrained data-driven flexibility in specification is introduced to discrete choice models through an optimization framework. Systematic data-driven methods for model selection are used to enhance replicability. The introduced flexibility is constrained to guarantee trustworthiness of predictions through consistency with theory. At the same time, the imposed constraints are validated through hypothesis tests to maintain credibility.
The framework we introduce well positions us to realize synergies between the data-driven and theory-based paradigms. The starting point for our approach is discrete choice models with well-established theoretical underpinnings that facilitate causal and behavioral interpretations. Discrete choice models consistent with random utility maximization, for example, are tethered to microeconomics and enable sound economic and welfare valuations. Further, the entire machinery of econometrics remains applicable to address endogeneity issues. This is in contrast to emerging trends in the literature that start with data-driven classifiers in pursuit of predictive gains, and then, as an afterthought, attempt to reconcile output with theory.
We provide applications of our proposed framework in addressing specification aspects of both the systematic and stochastic components of discrete choice models. Specialized solution algorithms are developed for each application– leveraging some of the latest advances in mixed-integer and conic optimization (for classical estimation) and in Markov Chain Monte Carlo methods (for Bayesian inference). The methods developed are tested for consistency using synthetic data and applied to empirical data. |
---|