An optimistic value iteration for mean–variance optimization in discounted Markov decision processes
This paper proposes an optimistic value iteration for steady-state mean–variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability in the long run, and future deviations are discounted to their present values. This...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-09-01
|
Series: | Results in Control and Optimization |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666720722000388 |