Deep reinforcement learning using least‐squares truncated temporal‐difference
Abstract Policy evaluation (PE) is a critical sub‐problem in reinforcement learning, which estimates the value function for a given policy and can be used for policy improvement. However, there still exist some limitations in current PE methods, such as low sample efficiency and local convergence, e...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2024-04-01
|
Series: | CAAI Transactions on Intelligence Technology |
Subjects: | |
Online Access: | https://doi.org/10.1049/cit2.12202 |