Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method

The goal of this paper is to provide theoretical analysis and additional insights on a distributed temporal-difference (TD)-learning algorithm for the multi-agent Markov decision processes (MDPs) via saddle-point viewpoints. The (single-agent) TD-learning is a reinforcement learning (RL) algorithm f...

Full description

Bibliographic Details
Main Authors: Donghwan Lee, Do Wan Kim, Jianghai Hu
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9906992/

Similar Items