Uplift modeling for randomized experiments and observational studies

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.

Bibliographic Details
Main Author:	Fang, Xiao, Ph. D. Massachusetts Institute of Technology
Other Authors:	David Simchi-Levi.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2018
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/115770

_version_	1826199405585235968
author	Fang, Xiao, Ph. D. Massachusetts Institute of Technology
author2	David Simchi-Levi.
author_facet	David Simchi-Levi. Fang, Xiao, Ph. D. Massachusetts Institute of Technology
author_sort	Fang, Xiao, Ph. D. Massachusetts Institute of Technology
collection	MIT
description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
first_indexed	2024-09-23T11:19:23Z
format	Thesis
id	mit-1721.1/115770
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T11:19:23Z
publishDate	2018
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1157702019-04-10T23:21:37Z Uplift modeling for randomized experiments and observational studies Fang, Xiao, Ph. D. Massachusetts Institute of Technology David Simchi-Levi. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018. Cataloged from PDF version of thesis. Includes bibliographical references (pages 101-107). Uplift modeling refers to the problem where we need to identify from a set of treatments the candidate that leads to the most desirable outcome based on subject characteristics. Most work in the last century focus on the average effect of a treatment across a population of interest, but ignores subject heterogeneity which is very common in real world. Recently there has been explosion of empirical settings which makes it possible to infer individualized treatment responses. We first consider the problem with data from randomized experiments. We put forward an unbiased estimate of the expected response, which makes it possible to evaluate an uplift model with multiple treatments. This is the first evaluation metric of uplift models aligning with the problem objective in the literature. Based on this evaluation metric, we design an ensemble tree-based algorithm (CTS) for uplift modeling. The splitting criterion and termination conditions are derived with the consideration of the special structure of uplift modeling problem. Experimental results on synthetic data and industry data show the advantage of our specialized uplift modeling algorithm over separate model approach and other existing uplift modeling algorithms. We next prove the asymptotic properties of a simplified CTS algorithm. The exhaustive search for locally optimal splitting points makes it difficult to theoretically analyze tree-based algorithms. Thus we adopt dyadic splits to CTS algorithm and obtain the bound of regret-expectation of performance difference between oracle and our algorithm. The convergence rate of the regret depends on the feature dimension, which emphasizes the importance of feature selection. While model performance usually improves with the number of features, it requires exponentially more data to approximate the optimal treatment rule. Choosing the appropriate complexity of the model and selecting the most powerful features are critical to achieving desirable performance. Finally we study the uplift modeling problem in the context of observational studies. In observational studies. treatment selection is influenced by subject characteristics. As a result. baseline characteristics often differ systematically between different treatments. Thus confounding factors need to be untangled before valid predictions are made. We combine a modification of the standard feed-forward architecture with our CTS algorithm to optimize predictive accuracy and minimize feature distribution distance between treatment. Experimental results on synthetic data show that the combination of neural network feature representation and ensemble tree-based model is promising to handle real-world problems. by Xiao Fang. Ph. D. 2018-05-23T16:34:03Z 2018-05-23T16:34:03Z 2018 2018 Thesis http://hdl.handle.net/1721.1/115770 1036987550 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 107 pages application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Fang, Xiao, Ph. D. Massachusetts Institute of Technology Uplift modeling for randomized experiments and observational studies
title	Uplift modeling for randomized experiments and observational studies
title_full	Uplift modeling for randomized experiments and observational studies
title_fullStr	Uplift modeling for randomized experiments and observational studies
title_full_unstemmed	Uplift modeling for randomized experiments and observational studies
title_short	Uplift modeling for randomized experiments and observational studies
title_sort	uplift modeling for randomized experiments and observational studies
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/115770
work_keys_str_mv	AT fangxiaophdmassachusettsinstituteoftechnology upliftmodelingforrandomizedexperimentsandobservationalstudies

Uplift modeling for randomized experiments and observational studies

Similar Items