Summary: | Model selection from a general unrestricted model (GUM) can potentially confront three very different environments: over-, exact, and under-specification of the data generation process (DGP). In the first, and most-studied setting, the DGP is nested in the GUM, and the main role of general-to-specific (Gets) selection is to eliminate the irrelevant variables while retaining the relevant. In an exact specification, the theory formulation is precisely correct and can always be retained by 'forcing' during selection, but is nevertheless embedded in a broader model where possible omissions, breaks, non-linearity, or data contamination are checked. The most realistic case is where some aspects of the relevant DGP are correctly included, but some are omitted, leading to under-specification. We review the analysis of model selection procedures which allow for many relevant effects, but inadvertently omit others, yet irrelevant variables are also included in the GUM, and exploit the ability of automatic procedures to handle more variables than observations, and consequentially tackle perfect collinearity. Considering all of the possibilities - where it is not known which one obtains in practice - reveals that model selection can excel relative to just fitting a prior specification, yet has very low costs when an exact specification is correctly postulated initially.
|