Essays on Experimental Design

This thesis contains three chapters. In the first chapter, I propose new econometric methods for panel data settings with unobserved cross-sectional heterogeneity. Previous approaches model this heterogeneity by assigning each unit a one-dimensional, discrete latent type, which can be estimated by r...

ver descrição completa

Detalhes bibliográficos
Autor principal: Cytrynbaum, Max
Outros Autores: Abadie, Alberto
Formato: Thesis
Publicado em: Massachusetts Institute of Technology 2022
Acesso em linha:https://hdl.handle.net/1721.1/144523
_version_ 1826213288339308544
author Cytrynbaum, Max
author2 Abadie, Alberto
author_facet Abadie, Alberto
Cytrynbaum, Max
author_sort Cytrynbaum, Max
collection MIT
description This thesis contains three chapters. In the first chapter, I propose new econometric methods for panel data settings with unobserved cross-sectional heterogeneity. Previous approaches model this heterogeneity by assigning each unit a one-dimensional, discrete latent type, which can be estimated by regression clustering methods. In this paper, I show that such models can be misspecified, even when the panel has significant discrete cross-sectional structure. Motivated by this finding, I generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple, imperfectly-correlated latent variables that describe its response-type to each covariate. I develop valid inference methods using a k-means style estimator of our model and propose information criteria to jointly select the number of clusters for each latent variable. I also contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting. My results suggest that over-fitting can be severe in k-means style estimators when the number of clusters is over-specified. The second chapter studies treatment effect estimation in a novel two-stage model of experimentation. In the first stage, using baseline covariates, the researcher selects units to participate in the experiment from a sample of eligible units. Next, they assign each selected unit to one of two treatment arms. I relate estimator efficiency to representative selection of participants and balanced assignment of treatments. I define a new family of local randomization procedures, which can be used for both selection and assignment. This family nests stratified block randomization and matched pairs, the most commonly used designs in practice in development economics, but also produces many useful new designs, embedding them in a unified framework. When used to select representative units into the experiment, local randomization boosts effective sample size, making estimators behave as if they were estimated using a larger experiment. When used for treatment assignment, local randomization does model-free non-parametric regression adjustment by design. I give novel asymptotically exact inference methods for locally randomized selection and assignment, allowing experimenters to report smaller confidence intervals if they designed a representative experiment. I apply our methods to the setting of two-wave design, where the researcher has access to a pilot study when designing the main experiment. I use local randomization methods to give the first fully efficient solution to this problem. The third chapter studies rerandomization and linear adjustment for average treatment effect estimation in stratified experiments. Our results show that in stratified experiments, ex-post regression adjustment can be strictly inefficient relative to difference of means estimation. Thus, the "agnostic'' efficiency improvement of Lin (2013) is atypical, corresponding to the edge case of complete randomization (no stratification). The problem arises because ex-post regression adjustment does not adapt to the stratification. In particular, it estimates the same linear adjustment coefficient for any locally randomized design. By contrast, I show that ex-ante rerandomization within strata does adaptive linear adjustment by design. In the tight acceptance criterion limit, rerandomization within strata is as efficient as the optimal linear adjustment for a given stratification. Equivalently, I show that rerandomization finds the optimal semiparametric completion of the non-parametric model produced by local randomization.
first_indexed 2024-09-23T15:46:41Z
format Thesis
id mit-1721.1/144523
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T15:46:41Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1445232022-08-30T03:08:08Z Essays on Experimental Design Cytrynbaum, Max Abadie, Alberto Mikusheva, Anna Massachusetts Institute of Technology. Department of Economics This thesis contains three chapters. In the first chapter, I propose new econometric methods for panel data settings with unobserved cross-sectional heterogeneity. Previous approaches model this heterogeneity by assigning each unit a one-dimensional, discrete latent type, which can be estimated by regression clustering methods. In this paper, I show that such models can be misspecified, even when the panel has significant discrete cross-sectional structure. Motivated by this finding, I generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple, imperfectly-correlated latent variables that describe its response-type to each covariate. I develop valid inference methods using a k-means style estimator of our model and propose information criteria to jointly select the number of clusters for each latent variable. I also contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting. My results suggest that over-fitting can be severe in k-means style estimators when the number of clusters is over-specified. The second chapter studies treatment effect estimation in a novel two-stage model of experimentation. In the first stage, using baseline covariates, the researcher selects units to participate in the experiment from a sample of eligible units. Next, they assign each selected unit to one of two treatment arms. I relate estimator efficiency to representative selection of participants and balanced assignment of treatments. I define a new family of local randomization procedures, which can be used for both selection and assignment. This family nests stratified block randomization and matched pairs, the most commonly used designs in practice in development economics, but also produces many useful new designs, embedding them in a unified framework. When used to select representative units into the experiment, local randomization boosts effective sample size, making estimators behave as if they were estimated using a larger experiment. When used for treatment assignment, local randomization does model-free non-parametric regression adjustment by design. I give novel asymptotically exact inference methods for locally randomized selection and assignment, allowing experimenters to report smaller confidence intervals if they designed a representative experiment. I apply our methods to the setting of two-wave design, where the researcher has access to a pilot study when designing the main experiment. I use local randomization methods to give the first fully efficient solution to this problem. The third chapter studies rerandomization and linear adjustment for average treatment effect estimation in stratified experiments. Our results show that in stratified experiments, ex-post regression adjustment can be strictly inefficient relative to difference of means estimation. Thus, the "agnostic'' efficiency improvement of Lin (2013) is atypical, corresponding to the edge case of complete randomization (no stratification). The problem arises because ex-post regression adjustment does not adapt to the stratification. In particular, it estimates the same linear adjustment coefficient for any locally randomized design. By contrast, I show that ex-ante rerandomization within strata does adaptive linear adjustment by design. In the tight acceptance criterion limit, rerandomization within strata is as efficient as the optimal linear adjustment for a given stratification. Equivalently, I show that rerandomization finds the optimal semiparametric completion of the non-parametric model produced by local randomization. Ph.D. 2022-08-29T15:53:16Z 2022-08-29T15:53:16Z 2022-05 2022-06-06T12:48:42.993Z Thesis https://hdl.handle.net/1721.1/144523 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Cytrynbaum, Max
Essays on Experimental Design
title Essays on Experimental Design
title_full Essays on Experimental Design
title_fullStr Essays on Experimental Design
title_full_unstemmed Essays on Experimental Design
title_short Essays on Experimental Design
title_sort essays on experimental design
url https://hdl.handle.net/1721.1/144523
work_keys_str_mv AT cytrynbaummax essaysonexperimentaldesign