Uplift Modeling with Multiple Treatments and General Response Types

Randomized experiments have been used to assist decision- making in many areas. They help people select the optimal treatment for the test population with certain statistical guarantee. However, subjects can show significant hetero-geneity in response to treatments. The problem of customizing treatm...

Full description

Bibliographic Details
Main Authors: Zhao, Yan, Fang, Xiao, Simchi-Levi, David
Other Authors: Massachusetts Institute of Technology. Department of Civil and Environmental Engineering
Format: Article
Language:en_US
Published: Society for Industrial and Applied Mathematics 2018
Online Access:http://hdl.handle.net/1721.1/119250
https://orcid.org/0000-0002-4650-1519
https://orcid.org/0000-0003-2761-9615
https://orcid.org/0000-0002-7348-1058
Description
Summary:Randomized experiments have been used to assist decision- making in many areas. They help people select the optimal treatment for the test population with certain statistical guarantee. However, subjects can show significant hetero-geneity in response to treatments. The problem of customizing treatment assignment based on subject characteristics is known as uplift modeling, differential response analysis, or personalized treatment learning in literature. A key feature for uplift modeling is that the data is unlabeled. It is impossible to know whether the chosen treatment is optimal for an individual subject because response under alternative treatments is unobserved. This presents a challenge to both the training and the evaluation of uplift models. In this paper we describe how to obtain an unbiased estimate of the key performance metric of an uplift model, the expected response. We present a new uplift algorithm which creates a forest of randomized trees. The trees are built with a splitting criterion designed to directly optimize their uplift performance based on the proposed evaluation method. Both the evaluation method and the algorithm apply to arbitrary number of treatments and general response types. Experimental results on synthetic data and industry-provided data show that our algorithm leads to significant performance improvement over other applicable methods.