Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method
The goal of this paper is to provide theoretical analysis and additional insights on a distributed temporal-difference (TD)-learning algorithm for the multi-agent Markov decision processes (MDPs) via saddle-point viewpoints. The (single-agent) TD-learning is a reinforcement learning (RL) algorithm f...
Main Authors: | Donghwan Lee, Do Wan Kim, Jianghai Hu |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9906992/ |
Similar Items
-
Learning to play against any mixture of opponents
by: Max Olan Smith, et al.
Published: (2023-07-01) -
Dual-Layer Q-Learning Strategy for Energy Management of Battery Storage in Grid-Connected Microgrids
by: Khawaja Haider Ali, et al.
Published: (2023-01-01) -
Relaxed Variable Metric Primal-Dual Fixed-Point Algorithm with Applications
by: Wenli Huang, et al.
Published: (2022-11-01) -
Primal-Dual Method of Solving Convex Quadratic Programming Problems
by: V. Moraru
Published: (2000-10-01) -
Primal-Dual Splitting Algorithms for Solving Structured Monotone Inclusion with Applications
by: Jinjian Chen, et al.
Published: (2021-12-01)