Mean-Variance Optimization in Markov Decision Processes

We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes th...

Full description

Bibliographic Details
Main Authors: Mannor, Shie, Tsitsiklis, John N.
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: International Machine Learning Society 2013
Online Access:http://hdl.handle.net/1721.1/79401
https://orcid.org/0000-0003-2658-8239
_version_ 1826202740880048128
author Mannor, Shie
Tsitsiklis, John N.
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Mannor, Shie
Tsitsiklis, John N.
author_sort Mannor, Shie
collection MIT
description We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudo-polynomial exact and approximation algorithms.
first_indexed 2024-09-23T12:16:49Z
format Article
id mit-1721.1/79401
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T12:16:49Z
publishDate 2013
publisher International Machine Learning Society
record_format dspace
spelling mit-1721.1/794012022-09-28T00:53:38Z Mean-Variance Optimization in Markov Decision Processes Mannor, Shie Tsitsiklis, John N. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Tsitsiklis, John N. We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudo-polynomial exact and approximation algorithms. National Science Foundation (U.S.) (grant CMMI-0856063) Israel Science Foundation (contract 890015) Technion, Israel Institute of Technology (Horeb Fellowship) 2013-07-01T20:24:53Z 2013-07-01T20:24:53Z 2011-06 Article http://purl.org/eprint/type/ConferencePaper 9781450306195 1450306195 http://hdl.handle.net/1721.1/79401 Mannor, Shie and John Tsitsiklis. "Mean-Variance Optimization in Markov Decision Processes ." in Twenty-Eighth International Conference on Machine Learning, ICML 2011, Jun. 28-Jul.2, Bellevue, Washington. 2011. https://orcid.org/0000-0003-2658-8239 en_US http://www.icml-2011.org/papers.php Proceedings of the Twenty-Eighth International Conference on Machine Learning, ICML 2011 Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf International Machine Learning Society Tsitsiklis via Amy Stout
spellingShingle Mannor, Shie
Tsitsiklis, John N.
Mean-Variance Optimization in Markov Decision Processes
title Mean-Variance Optimization in Markov Decision Processes
title_full Mean-Variance Optimization in Markov Decision Processes
title_fullStr Mean-Variance Optimization in Markov Decision Processes
title_full_unstemmed Mean-Variance Optimization in Markov Decision Processes
title_short Mean-Variance Optimization in Markov Decision Processes
title_sort mean variance optimization in markov decision processes
url http://hdl.handle.net/1721.1/79401
https://orcid.org/0000-0003-2658-8239
work_keys_str_mv AT mannorshie meanvarianceoptimizationinmarkovdecisionprocesses
AT tsitsiklisjohnn meanvarianceoptimizationinmarkovdecisionprocesses