Online learning with sample path constraints
We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We de ne the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
MIT Press
2010
|
Online Access: | http://hdl.handle.net/1721.1/51700 https://orcid.org/0000-0003-2658-8239 |
_version_ | 1826191532888162304 |
---|---|
author | Mannor, Shie Tsitsiklis, John N. Yu, Jia Yuan |
author2 | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science |
author_facet | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Mannor, Shie Tsitsiklis, John N. Yu, Jia Yuan |
author_sort | Mannor, Shie |
collection | MIT |
description | We study online learning where a decision maker interacts with Nature with the objective
of maximizing her long-term average reward subject to some sample path average
constraints. We de ne the reward-in-hindsight as the highest reward the decision maker
could have achieved, while satisfying the constraints, had she known Nature's choices in
advance. We show that in general the reward-in-hindsight is not attainable. The convex
hull of the reward-in-hindsight function is, however, attainable. For the important case of
a single constraint, the convex hull turns out to be the highest attainable function. Using
a calibrated forecasting rule, we provide an explicit strategy that attains this convex hull.
We also measure the performance of heuristic methods based on non-calibrated forecasters
in experiments involving a CPU power management problem. |
first_indexed | 2024-09-23T08:57:20Z |
format | Article |
id | mit-1721.1/51700 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T08:57:20Z |
publishDate | 2010 |
publisher | MIT Press |
record_format | dspace |
spelling | mit-1721.1/517002022-09-26T09:27:12Z Online learning with sample path constraints Mannor, Shie Tsitsiklis, John N. Yu, Jia Yuan Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Laboratory for Information and Decision Systems Tsitsiklis, John N. Tsitsiklis, John N. We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We de ne the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint, the convex hull turns out to be the highest attainable function. Using a calibrated forecasting rule, we provide an explicit strategy that attains this convex hull. We also measure the performance of heuristic methods based on non-calibrated forecasters in experiments involving a CPU power management problem. Canada Research Chairs Program Natural Sciences and Engineering Research Council of Canada National Science Foundation 2010-02-11T15:42:19Z 2010-02-11T15:42:19Z 2009 Article http://purl.org/eprint/type/SubmittedJournalArticle 1532-4435 http://hdl.handle.net/1721.1/51700 Mannor, Shie, John N. Tsitsiklis, and Jia Yuan Yu. “Online Learning with Sample Path Constraints.” J. Mach. Learn. Res. 10 (2009): 569-590. https://orcid.org/0000-0003-2658-8239 en_US http://portal.acm.org/citation.cfm?id=1577069.1577089 Journal of Machine Learning Research Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf MIT Press John Tsitsiklis |
spellingShingle | Mannor, Shie Tsitsiklis, John N. Yu, Jia Yuan Online learning with sample path constraints |
title | Online learning with sample path constraints |
title_full | Online learning with sample path constraints |
title_fullStr | Online learning with sample path constraints |
title_full_unstemmed | Online learning with sample path constraints |
title_short | Online learning with sample path constraints |
title_sort | online learning with sample path constraints |
url | http://hdl.handle.net/1721.1/51700 https://orcid.org/0000-0003-2658-8239 |
work_keys_str_mv | AT mannorshie onlinelearningwithsamplepathconstraints AT tsitsiklisjohnn onlinelearningwithsamplepathconstraints AT yujiayuan onlinelearningwithsamplepathconstraints |