Online learning with sample path constraints

We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We de ne the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the...

Full description

Bibliographic Details
Main Authors: Mannor, Shie, Tsitsiklis, John N., Yu, Jia Yuan
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: MIT Press 2010
Online Access:http://hdl.handle.net/1721.1/51700
https://orcid.org/0000-0003-2658-8239
_version_ 1826191532888162304
author Mannor, Shie
Tsitsiklis, John N.
Yu, Jia Yuan
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Mannor, Shie
Tsitsiklis, John N.
Yu, Jia Yuan
author_sort Mannor, Shie
collection MIT
description We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We de ne the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint, the convex hull turns out to be the highest attainable function. Using a calibrated forecasting rule, we provide an explicit strategy that attains this convex hull. We also measure the performance of heuristic methods based on non-calibrated forecasters in experiments involving a CPU power management problem.
first_indexed 2024-09-23T08:57:20Z
format Article
id mit-1721.1/51700
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T08:57:20Z
publishDate 2010
publisher MIT Press
record_format dspace
spelling mit-1721.1/517002022-09-26T09:27:12Z Online learning with sample path constraints Mannor, Shie Tsitsiklis, John N. Yu, Jia Yuan Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Laboratory for Information and Decision Systems Tsitsiklis, John N. Tsitsiklis, John N. We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We de ne the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint, the convex hull turns out to be the highest attainable function. Using a calibrated forecasting rule, we provide an explicit strategy that attains this convex hull. We also measure the performance of heuristic methods based on non-calibrated forecasters in experiments involving a CPU power management problem. Canada Research Chairs Program Natural Sciences and Engineering Research Council of Canada National Science Foundation 2010-02-11T15:42:19Z 2010-02-11T15:42:19Z 2009 Article http://purl.org/eprint/type/SubmittedJournalArticle 1532-4435 http://hdl.handle.net/1721.1/51700 Mannor, Shie, John N. Tsitsiklis, and Jia Yuan Yu. “Online Learning with Sample Path Constraints.” J. Mach. Learn. Res. 10 (2009): 569-590. https://orcid.org/0000-0003-2658-8239 en_US http://portal.acm.org/citation.cfm?id=1577069.1577089 Journal of Machine Learning Research Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf MIT Press John Tsitsiklis
spellingShingle Mannor, Shie
Tsitsiklis, John N.
Yu, Jia Yuan
Online learning with sample path constraints
title Online learning with sample path constraints
title_full Online learning with sample path constraints
title_fullStr Online learning with sample path constraints
title_full_unstemmed Online learning with sample path constraints
title_short Online learning with sample path constraints
title_sort online learning with sample path constraints
url http://hdl.handle.net/1721.1/51700
https://orcid.org/0000-0003-2658-8239
work_keys_str_mv AT mannorshie onlinelearningwithsamplepathconstraints
AT tsitsiklisjohnn onlinelearningwithsamplepathconstraints
AT yujiayuan onlinelearningwithsamplepathconstraints