Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-factors. The main difference is in the policy evaluation phase: instead of solving a linear system of equations, our algorithm...

Full description

Bibliographic Details
Main Authors:	Bertsekas, Dimitri P, Yu, Huizhen
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	Institute for Operations Research and the Management Sciences (INFORMS) 2019
Online Access:	https://hdl.handle.net/1721.1/121248

Similar Items

Distributed Asynchronous Policy Iteration in Dynamic Programming
by: Bertsekas, Dimitri P., et al.
Published: (2011)

Q-learning and policy iteration algorithms for stochastic shortest path problems
by: Yu, Huizhen, et al.
Published: (2015)

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
by: Yu, Huizhen, et al.
Published: (2015)

Multiagent value iteration algorithms in dynamic programming and reinforcement learning
by: Dimitri Bertsekas
Published: (2020-12-01)

Approximate policy iteration: A survey and some new methods
by: Bertsekas, Dimitri P.
Published: (2012)

Regular Policies in Abstract Dynamic Programming
by: Bertsekas, Dimitri P
Published: (2018)

Pathologies of Temporal Difference Methods in Approximate Dynamic Programming
by: Bertsekas, Dimitri P.
Published: (2011)

Dynamic programming : deterministic and stochastic models /
by: 196299 Bertsekas, Dimitri P.
Published: (1987)

6.231 Dynamic Programming and Stochastic Control, Fall 2002
by: Bertsekas, Dimitri P.
Published: (2002)

6.231 Dynamic Programming and Stochastic Control, Fall 2008
by: Bertsekas, Dimitri
Published: (2008)

6.231 Dynamic Programming and Stochastic Control, Fall 2011
by: Bertsekas, Dimitri
Published: (2011)

Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems
by: Wang, Mengdi, et al.
Published: (2015)

Basis Function Adaptation Methods for Cost Approximation in MDP
by: Yu, Huizhen, et al.
Published: (2010)

Convergence Results for Some Temporal Difference Methods Based on Least Squares
by: Yu, Huizhen, et al.
Published: (2012)

A Unifying Polyhedral Approximation Framework for Convex Optimization
by: Bertsekas, Dimitri P., et al.
Published: (2011)

On the convergence of simulation-based iterative methods for solving singular linear systems
by: Mengdi Wang, et al.
Published: (2013-01-01)

Adaptive aggregation methods for discounted dynamic programming
Published: (2003)

Discounting and climate policy
by: Van der Ploeg, R
Published: (2020)

Dynamic and stochastic control
by: 196299 Bertsekas, Dimitri P.
Published: (1976)

Newton’s method for reinforcement learning and model predictive control
by: Dimitri Bertsekas
Published: (2022-06-01)

The outlier paradox: the role of iterative ensemble coding in discounting outliers
by: Epstein, M, et al.
Published: (2020)

Discounting for Energy Transition Policies—Estimation of the Social Discount Rate for Poland
by: Monika Foltyn-Zarychta, et al.
Published: (2021-01-01)

Discounting for public policy: A survey
by: Greaves, H
Published: (2017)

The Role of Discounting in Energy Policy Investments
by: Gabriella Maselli, et al.
Published: (2021-09-01)

An optimistic value iteration for mean–variance optimization in discounted Markov decision processes
by: Shuai Ma, et al.
Published: (2022-09-01)

Incremental proximal methods for large scale convex optimization
by: Bertsekas, Dimitri P.
Published: (2012)

A unified framework for temporal difference methods
by: Bertsekas, Dimitri P.
Published: (2010)

Behavioural Economics, Hyperbolic Discounting and Environmental Policy
by: Hepburn, C, et al.
Published: (2010)

Behavioural economics, hyperbolic discounting and environmental policy
by: Hepburn, C, et al.
Published: (2010)

Federal Reserve System discount policy: an appraisal.
by: J.H. KAREKEN
Published: (2014-03-01)

Optimal Discounting and Replenishment Policies for Perishable Products
by: Chua, Geoffrey Ang, et al.
Published: (2017)

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
by: Jaakkola, Tommi, et al.
Published: (2004)

Proximal algorithms and temporal difference methods for solving fixed point problems
by: Bertsekas, Dimitri P
Published: (2021)

Control of uncertain systems with a set-membership description of the uncertainty.
by: Bertsekas, Dimitri P
Published: (2005)

New auction algorithms for the assignment problem and extensions
by: Dimitri Bertsekas
Published: (2024-03-01)

6.253 Convex Analysis and Optimization, Spring 2010
by: Bertsekas, Dimitri
Published: (2010)

6.253 Convex Analysis and Optimization, Spring 2004
by: Bertsekas, Dimitri
Published: (2004)

Discounting, Patience, and Dynamic Decision Making.
by: Quah, J, et al.
Published: (2011)

Discounting, patience, and dynamic decision making
by: Quah, J, et al.
Published: (2011)

Energy, Trophic Dynamics and Ecological Discounting
by: Georgios Karakatsanis, et al.
Published: (2023-10-01)