Online learning with sample path constraints
We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We de ne the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
MIT Press
2010
|
Online Access: | http://hdl.handle.net/1721.1/51700 https://orcid.org/0000-0003-2658-8239 |
Summary: | We study online learning where a decision maker interacts with Nature with the objective
of maximizing her long-term average reward subject to some sample path average
constraints. We de ne the reward-in-hindsight as the highest reward the decision maker
could have achieved, while satisfying the constraints, had she known Nature's choices in
advance. We show that in general the reward-in-hindsight is not attainable. The convex
hull of the reward-in-hindsight function is, however, attainable. For the important case of
a single constraint, the convex hull turns out to be the highest attainable function. Using
a calibrated forecasting rule, we provide an explicit strategy that attains this convex hull.
We also measure the performance of heuristic methods based on non-calibrated forecasters
in experiments involving a CPU power management problem. |
---|