Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoret...

Full description

Bibliographic Details
Main Authors: Foster, Dylan J, Rakhlin, Alexander
Other Authors: Statistics and Data Science Center (Massachusetts Institute of Technology)
Format: Article
Language:English
Published: 2021
Online Access:https://hdl.handle.net/1721.1/138306
_version_ 1811087561979854848
author Foster, Dylan J
Rakhlin, Alexander
author2 Statistics and Data Science Center (Massachusetts Institute of Technology)
author_facet Statistics and Data Science Center (Massachusetts Institute of Technology)
Foster, Dylan J
Rakhlin, Alexander
author_sort Foster, Dylan J
collection MIT
description A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
first_indexed 2024-09-23T13:48:03Z
format Article
id mit-1721.1/138306
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T13:48:03Z
publishDate 2021
record_format dspace
spelling mit-1721.1/1383062023-02-03T20:29:28Z Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles Foster, Dylan J Rakhlin, Alexander Statistics and Data Science Center (Massachusetts Institute of Technology) Massachusetts Institute of Technology. Institute for Data, Systems, and Society Massachusetts Institute of Technology. Laboratory for Information and Decision Systems Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially. 2021-12-03T15:09:50Z 2021-12-03T15:09:50Z 2020 2021-12-03T15:06:05Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/138306 Foster, Dylan J and Rakhlin, Alexander. 2020. "Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles." INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 119. en https://proceedings.mlr.press/v119/foster20a INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119 Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Proceedings of Machine Learning Research
spellingShingle Foster, Dylan J
Rakhlin, Alexander
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title_full Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title_fullStr Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title_full_unstemmed Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title_short Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
title_sort beyond ucb optimal and efficient contextual bandits with regression oracles
url https://hdl.handle.net/1721.1/138306
work_keys_str_mv AT fosterdylanj beyonducboptimalandefficientcontextualbanditswithregressionoracles
AT rakhlinalexander beyonducboptimalandefficientcontextualbanditswithregressionoracles