Universal Reinforcement Learning

We consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence future observations and costs. The goal is to minimize the long-term average cost. We propose a novel algorithm, known as the ac...

Full description

Bibliographic Details
Main Authors:	Farias, Vivek F., Moallemi, Ciamac C., Van Roy, Benjamin, Weissman, Tsachy
Other Authors:	Sloan School of Management
Format:	Article
Language:	en_US
Published:	Institute of Electrical and Electronics Engineers 2010
Subjects:	value iteration reinforcement learning optimal control dynamic programming Lempel-Ziv Context tree
Online Access:	http://hdl.handle.net/1721.1/59294 https://orcid.org/0000-0002-5856-9246

Internet

http://hdl.handle.net/1721.1/59294
https://orcid.org/0000-0002-5856-9246

Universal Reinforcement Learning

Internet

Similar Items