Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoret...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2021
|
Online Access: | https://hdl.handle.net/1721.1/138306 |
_version_ | 1811087561979854848 |
---|---|
author | Foster, Dylan J Rakhlin, Alexander |
author2 | Statistics and Data Science Center (Massachusetts Institute of Technology) |
author_facet | Statistics and Data Science Center (Massachusetts Institute of Technology) Foster, Dylan J Rakhlin, Alexander |
author_sort | Foster, Dylan J |
collection | MIT |
description | A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially. |
first_indexed | 2024-09-23T13:48:03Z |
format | Article |
id | mit-1721.1/138306 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T13:48:03Z |
publishDate | 2021 |
record_format | dspace |
spelling | mit-1721.1/1383062023-02-03T20:29:28Z Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles Foster, Dylan J Rakhlin, Alexander Statistics and Data Science Center (Massachusetts Institute of Technology) Massachusetts Institute of Technology. Institute for Data, Systems, and Society Massachusetts Institute of Technology. Laboratory for Information and Decision Systems Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially. 2021-12-03T15:09:50Z 2021-12-03T15:09:50Z 2020 2021-12-03T15:06:05Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/138306 Foster, Dylan J and Rakhlin, Alexander. 2020. "Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles." INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 119. en https://proceedings.mlr.press/v119/foster20a INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119 Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Proceedings of Machine Learning Research |
spellingShingle | Foster, Dylan J Rakhlin, Alexander Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles |
title | Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles |
title_full | Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles |
title_fullStr | Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles |
title_full_unstemmed | Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles |
title_short | Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles |
title_sort | beyond ucb optimal and efficient contextual bandits with regression oracles |
url | https://hdl.handle.net/1721.1/138306 |
work_keys_str_mv | AT fosterdylanj beyonducboptimalandefficientcontextualbanditswithregressionoracles AT rakhlinalexander beyonducboptimalandefficientcontextualbanditswithregressionoracles |