Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoret...

Full description

Bibliographic Details
Main Authors:	Foster, Dylan J, Rakhlin, Alexander
Other Authors:	Statistics and Data Science Center (Massachusetts Institute of Technology)
Format:	Article
Language:	English
Published:	2021
Online Access:	https://hdl.handle.net/1721.1/138306

_version_	1811087561979854848
author	Foster, Dylan J Rakhlin, Alexander
author2	Statistics and Data Science Center (Massachusetts Institute of Technology)
author_facet	Statistics and Data Science Center (Massachusetts Institute of Technology) Foster, Dylan J Rakhlin, Alexander
author_sort	Foster, Dylan J
collection	MIT
description	A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
first_indexed	2024-09-23T13:48:03Z
format	Article
id	mit-1721.1/138306
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T13:48:03Z
publishDate	2021
record_format	dspace
spelling	mit-1721.1/1383062023-02-03T20:29:28Z Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles Foster, Dylan J Rakhlin, Alexander Statistics and Data Science Center (Massachusetts Institute of Technology) Massachusetts Institute of Technology. Institute for Data, Systems, and Society Massachusetts Institute of Technology. Laboratory for Information and Decision Systems Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially. 2021-12-03T15:09:50Z 2021-12-03T15:09:50Z 2020 2021-12-03T15:06:05Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/138306 Foster, Dylan J and Rakhlin, Alexander. 2020. "Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles." INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 119. en https://proceedings.mlr.press/v119/foster20a INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119 Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Proceedings of Machine Learning Research
spellingShingle	Foster, Dylan J Rakhlin, Alexander Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title	Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title_full	Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title_fullStr	Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title_full_unstemmed	Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title_short	Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title_sort	beyond ucb optimal and efficient contextual bandits with regression oracles
url	https://hdl.handle.net/1721.1/138306
work_keys_str_mv	AT fosterdylanj beyonducboptimalandefficientcontextualbanditswithregressionoracles AT rakhlinalexander beyonducboptimalandefficientcontextualbanditswithregressionoracles

Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

Similar Items