Online learning with sample path constraints

We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We de ne the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the...

Descripció completa

Dades bibliogràfiques
Autors principals:	Mannor, Shie, Tsitsiklis, John N., Yu, Jia Yuan
Altres autors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Idioma:	en_US
Publicat:	MIT Press 2010
Accés en línia:	http://hdl.handle.net/1721.1/51700 https://orcid.org/0000-0003-2658-8239

Internet

http://hdl.handle.net/1721.1/51700
https://orcid.org/0000-0003-2658-8239

Online learning with sample path constraints

Internet

Ítems similars