Q-learning with nearest neighbors

© 2018 Curran Associates Inc.All rights reserved. We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is ava...

Full description

Bibliographic Details
Main Authors: Shah, Devavrat, Xie, Qiaomin
Other Authors: Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
Format: Article
Language:English
Published: 2021
Online Access:https://hdl.handle.net/1721.1/137946