Dynamic-depth context tree weighting
Reinforcement learning (RL) in partially observable settings is challenging be- cause the agent’s immediate observations are not Markov. Recently proposed methods can learn variable-order Markov models of the underlying process but have steep memory requirements and are sensitive to aliasing betw...
Hoofdauteurs: | , |
---|---|
Formaat: | Conference item |
Gepubliceerd in: |
Curran Associates
2018
|