Online learning with sample path constraints
We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We de ne the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
MIT Press
2010
|
Online Access: | http://hdl.handle.net/1721.1/51700 https://orcid.org/0000-0003-2658-8239 |