Deep reinforcement learning using least‐squares truncated temporal‐difference

Abstract Policy evaluation (PE) is a critical sub‐problem in reinforcement learning, which estimates the value function for a given policy and can be used for policy improvement. However, there still exist some limitations in current PE methods, such as low sample efficiency and local convergence, e...

Full description

Bibliographic Details
Main Authors: Junkai Ren, Yixing Lan, Xin Xu, Yichuan Zhang, Qiang Fang, Yujun Zeng
Format: Article
Language:English
Published: Wiley 2024-04-01
Series:CAAI Transactions on Intelligence Technology
Subjects:
Online Access:https://doi.org/10.1049/cit2.12202