Regulating exploration in multi-armed bandit problems with time patterns and dying arms

Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018.

Bibliographic Details
Main Author:	Tracà, Stefano
Other Authors:	Roy E. Welsch.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2018
Subjects:	Operations Research Center.
Online Access:	http://hdl.handle.net/1721.1/119353

_version_	1826211253031272448
author	Tracà, Stefano
author2	Roy E. Welsch.
author_facet	Roy E. Welsch. Tracà, Stefano
author_sort	Tracà, Stefano
collection	MIT
description	Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018.
first_indexed	2024-09-23T15:03:01Z
format	Thesis
id	mit-1721.1/119353
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T15:03:01Z
publishDate	2018
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1193532019-04-11T04:26:58Z Regulating exploration in multi-armed bandit problems with time patterns and dying arms Tracà, Stefano Roy E. Welsch. Massachusetts Institute of Technology. Operations Research Center. Massachusetts Institute of Technology. Operations Research Center. Operations Research Center. Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018. Cataloged from PDF version of thesis. Includes bibliographical references (pages 65-70). In retail, there are predictable yet dramatic time-dependent patterns in customer behavior, such as periodic changes in the number of visitors, or increases in customers just before major holidays. The standard paradigm of multi-armed bandit analysis does not take these known patterns into account. This means that for applications in retail, where prices are fixed for periods of time, current bandit algorithms will not suffice. This work provides a framework and methods that take the time-dependent patterns into account. In the corrected methods, exploitation (greed) is regulated over time, so that more exploitation occurs during higher reward periods, and more exploration occurs in periods of low reward. In order to understand why regret is reduced with the corrected methods, a set of bounds on the expected regret are presented and insights into why we would want to exploit during periods of high reward are discussed. When the set of available options changes over time, mortal bandits algorithms have proven to be extremely useful in a number of settings, for example, for providing news article recommendations, or running automated online advertising campaigns. Previous work on this problem showed how to regulate exploration of new arms when they have recently appeared, but they do not adapt when the arms are about to disappear. Since in most applications we can determine either exactly or approximately when arms will disappear, we can leverage this information to improve performance: we should not be exploring arms that are about to disappear. Also for this framework, adaptations of algorithms and regret bounds are provided. The proposed methods perform well in experiments, and were inspired by a high-scoring entry in the Exploration and Exploitation 3 contest using data from Yahoo! Front Page. That entry heavily used time-series methods to regulate greed over time, which was substantially more effective than other contextual bandit methods. by Stefano Tracà. Ph. D. 2018-11-28T15:44:31Z 2018-11-28T15:44:31Z 2018 2018 Thesis http://hdl.handle.net/1721.1/119353 1065541758 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 173 pages application/pdf Massachusetts Institute of Technology
spellingShingle	Operations Research Center. Tracà, Stefano Regulating exploration in multi-armed bandit problems with time patterns and dying arms
title	Regulating exploration in multi-armed bandit problems with time patterns and dying arms
title_full	Regulating exploration in multi-armed bandit problems with time patterns and dying arms
title_fullStr	Regulating exploration in multi-armed bandit problems with time patterns and dying arms
title_full_unstemmed	Regulating exploration in multi-armed bandit problems with time patterns and dying arms
title_short	Regulating exploration in multi-armed bandit problems with time patterns and dying arms
title_sort	regulating exploration in multi armed bandit problems with time patterns and dying arms
topic	Operations Research Center.
url	http://hdl.handle.net/1721.1/119353
work_keys_str_mv	AT tracastefano regulatingexplorationinmultiarmedbanditproblemswithtimepatternsanddyingarms

Regulating exploration in multi-armed bandit problems with time patterns and dying arms

Similar Items