Value propagation networks
We present Value Propagation (VProp), a parameter-efficient differentiable planning module built on Value Iteration which can successfully be trained in a reinforcement learning fashion to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic...
Päätekijät: | , , , , , |
---|---|
Aineistotyyppi: | Conference item |
Kieli: | English |
Julkaistu: |
OpenReview
2019
|