Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method

The goal of this paper is to provide theoretical analysis and additional insights on a distributed temporal-difference (TD)-learning algorithm for the multi-agent Markov decision processes (MDPs) via saddle-point viewpoints. The (single-agent) TD-learning is a reinforcement learning (RL) algorithm f...

Full description

Bibliographic Details
Main Authors:	Donghwan Lee, Do Wan Kim, Jianghai Hu
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Reinforcement learning (RL) multi-agent systems convergence temporal difference (TD) learning machine learning primal-dual method
Online Access:	https://ieeexplore.ieee.org/document/9906992/

Internet

https://ieeexplore.ieee.org/document/9906992/

Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method

Internet

Similar Items