Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method
The goal of this paper is to provide theoretical analysis and additional insights on a distributed temporal-difference (TD)-learning algorithm for the multi-agent Markov decision processes (MDPs) via saddle-point viewpoints. The (single-agent) TD-learning is a reinforcement learning (RL) algorithm f...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9906992/ |