Summary: | This thesis contains three chapters. In the first chapter, I propose new econometric methods for panel data settings with unobserved cross-sectional heterogeneity. Previous approaches model this heterogeneity by assigning each unit a one-dimensional, discrete latent type, which can be estimated by regression clustering methods. In this paper, I show that such models can be misspecified, even when the panel has significant discrete cross-sectional structure. Motivated by this finding, I generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple, imperfectly-correlated latent variables that describe its response-type to each covariate. I develop valid inference methods using a k-means style estimator of our model and propose information criteria to jointly select the number of clusters for each latent variable. I also contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting. My results suggest that over-fitting can be severe in k-means style estimators when the number of clusters is over-specified.
The second chapter studies treatment effect estimation in a novel two-stage model of experimentation. In the first stage, using baseline covariates, the researcher selects units to participate in the experiment from a sample of eligible units. Next, they assign each selected unit to one of two treatment arms. I relate estimator efficiency to representative selection of participants and balanced assignment of treatments. I define a new family of local randomization procedures, which can be used for both selection and assignment. This family nests stratified block randomization and matched pairs, the most commonly used designs in practice in development economics, but also produces many useful new designs, embedding them in a unified framework. When used to select representative units into the experiment, local randomization boosts effective sample size, making estimators behave as if they were estimated using a larger experiment. When used for treatment assignment, local randomization does model-free non-parametric regression adjustment by design. I give novel asymptotically exact inference methods for locally randomized selection and assignment, allowing experimenters to report smaller confidence intervals if they designed a representative experiment. I apply our methods to the setting of two-wave design, where the researcher has access to a pilot study when designing the main experiment. I use local randomization methods to give the first fully efficient solution to this problem.
The third chapter studies rerandomization and linear adjustment for average treatment effect estimation in stratified experiments. Our results show that in stratified experiments, ex-post regression adjustment can be strictly inefficient relative to difference of means estimation. Thus, the "agnostic'' efficiency improvement of Lin (2013) is atypical, corresponding to the edge case of complete randomization (no stratification). The problem arises because ex-post regression adjustment does not adapt to the stratification. In particular, it estimates the same linear adjustment coefficient for any locally randomized design. By contrast, I show that ex-ante rerandomization within strata does adaptive linear adjustment by design. In the tight acceptance criterion limit, rerandomization within strata is as efficient as the optimal linear adjustment for a given stratification. Equivalently, I show that rerandomization finds the optimal semiparametric completion of the non-parametric model produced by local randomization.
|