An optimistic value iteration for mean–variance optimization in discounted Markov decision processes

This paper proposes an optimistic value iteration for steady-state mean–variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability in the long run, and future deviations are discounted to their present values. This...

Full description

Bibliographic Details
Main Authors: Shuai Ma, Xiaoteng Ma, Li Xia
Format: Article
Language:English
Published: Elsevier 2022-09-01
Series:Results in Control and Optimization
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666720722000388