Maximizing information gain in partially observable environments via prediction rewards

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent’s uncertainty. For example, the reward can be the negative entropy of the agent’s belief over an unknown (or hidden) variable. Typically, the...

Descripción completa

Detalles Bibliográficos
Autores principales: Satsangi, Y, Lim, S, Whiteson, S, Oliehoek, FA, White, M
Formato: Conference item
Lenguaje:English
Publicado: International Foundation for Autonomous Agents and Multiagent Systems 2020