Universal Reinforcement Learning
We consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence future observations and costs. The goal is to minimize the long-term average cost. We propose a novel algorithm, known as the ac...
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Institute of Electrical and Electronics Engineers
2010
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/59294 https://orcid.org/0000-0002-5856-9246 |