Bandit Problems under Censored Feedback

In this thesis, we study sequential decision-making models where the feedback received by the principal depends on strategic uncertainty (e.g., agents’ willingness to follow a recommendation) and/or random uncertainty (e.g., loss or delay in arrival of information). Such challenges often arise in AI...

Full description

Bibliographic Details
Main Author:	Guinet, Gauthier Marc Benoit
Other Authors:	Amin, Saurabh
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/147326

_version_	1826201786876166144
author	Guinet, Gauthier Marc Benoit
author2	Amin, Saurabh
author_facet	Amin, Saurabh Guinet, Gauthier Marc Benoit
author_sort	Guinet, Gauthier Marc Benoit
collection	MIT
description	In this thesis, we study sequential decision-making models where the feedback received by the principal depends on strategic uncertainty (e.g., agents’ willingness to follow a recommendation) and/or random uncertainty (e.g., loss or delay in arrival of information). Such challenges often arise in AI-driven platforms, with applications in recommender systems, revenue management or transportation. We model and study this class of problems through the lens of multi-armed and contextual bandits evolving in censored environments. Our goal is to estimate the performance loss due to censorship in the context of classical algorithms designed for uncensored environments. Our main contributions include the introduction of a broad class of censorship models and their analysis in terms of the effective dimension of the problem – a natural measure of its underlying statistical complexity and main driver of the regret bound. In particular, the effective dimension allows us to maintain the structure of the original problem at first order, while embedding it in a bigger space, and thus naturally leads to results analogous to uncensored settings. Our analysis involves a continuous generalization of the Elliptical Potential Inequality, which we believe is of independent interest. We also discover an interesting property of decision-making under censorship: a transient phase during which initial misspecification of censorship is self-corrected at an extra cost; followed by a stationary phase that reflects the inherent slowdown of learning governed by the effective dimension.
first_indexed	2024-09-23T11:56:45Z
format	Thesis
id	mit-1721.1/147326
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T11:56:45Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1473262023-01-20T03:36:03Z Bandit Problems under Censored Feedback Guinet, Gauthier Marc Benoit Amin, Saurabh Jaillet, Patrick Massachusetts Institute of Technology. Operations Research Center In this thesis, we study sequential decision-making models where the feedback received by the principal depends on strategic uncertainty (e.g., agents’ willingness to follow a recommendation) and/or random uncertainty (e.g., loss or delay in arrival of information). Such challenges often arise in AI-driven platforms, with applications in recommender systems, revenue management or transportation. We model and study this class of problems through the lens of multi-armed and contextual bandits evolving in censored environments. Our goal is to estimate the performance loss due to censorship in the context of classical algorithms designed for uncensored environments. Our main contributions include the introduction of a broad class of censorship models and their analysis in terms of the effective dimension of the problem – a natural measure of its underlying statistical complexity and main driver of the regret bound. In particular, the effective dimension allows us to maintain the structure of the original problem at first order, while embedding it in a bigger space, and thus naturally leads to results analogous to uncensored settings. Our analysis involves a continuous generalization of the Elliptical Potential Inequality, which we believe is of independent interest. We also discover an interesting property of decision-making under censorship: a transient phase during which initial misspecification of censorship is self-corrected at an extra cost; followed by a stationary phase that reflects the inherent slowdown of learning governed by the effective dimension. S.M. 2023-01-19T18:45:41Z 2023-01-19T18:45:41Z 2022-09 2022-08-09T19:39:03.971Z Thesis https://hdl.handle.net/1721.1/147326 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Guinet, Gauthier Marc Benoit Bandit Problems under Censored Feedback
title	Bandit Problems under Censored Feedback
title_full	Bandit Problems under Censored Feedback
title_fullStr	Bandit Problems under Censored Feedback
title_full_unstemmed	Bandit Problems under Censored Feedback
title_short	Bandit Problems under Censored Feedback
title_sort	bandit problems under censored feedback
url	https://hdl.handle.net/1721.1/147326
work_keys_str_mv	AT guinetgauthiermarcbenoit banditproblemsundercensoredfeedback

Bandit Problems under Censored Feedback

Similar Items