Dynamic-depth context tree weighting

Reinforcement learning (RL) in partially observable settings is challenging be- cause the agent’s immediate observations are not Markov. Recently proposed methods can learn variable-order Markov models of the underlying process but have steep memory requirements and are sensitive to aliasing betw...

Mô tả đầy đủ

Chi tiết về thư mục
Những tác giả chính: Messias, J, Whiteson, S
Định dạng: Conference item
Được phát hành: Curran Associates 2018