Mean-Variance Optimization in Markov Decision Processes
We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes th...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
International Machine Learning Society
2013
|
Online Access: | http://hdl.handle.net/1721.1/79401 https://orcid.org/0000-0003-2658-8239 |
_version_ | 1826202740880048128 |
---|---|
author | Mannor, Shie Tsitsiklis, John N. |
author2 | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science |
author_facet | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Mannor, Shie Tsitsiklis, John N. |
author_sort | Mannor, Shie |
collection | MIT |
description | We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudo-polynomial exact and approximation algorithms. |
first_indexed | 2024-09-23T12:16:49Z |
format | Article |
id | mit-1721.1/79401 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T12:16:49Z |
publishDate | 2013 |
publisher | International Machine Learning Society |
record_format | dspace |
spelling | mit-1721.1/794012022-09-28T00:53:38Z Mean-Variance Optimization in Markov Decision Processes Mannor, Shie Tsitsiklis, John N. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Tsitsiklis, John N. We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudo-polynomial exact and approximation algorithms. National Science Foundation (U.S.) (grant CMMI-0856063) Israel Science Foundation (contract 890015) Technion, Israel Institute of Technology (Horeb Fellowship) 2013-07-01T20:24:53Z 2013-07-01T20:24:53Z 2011-06 Article http://purl.org/eprint/type/ConferencePaper 9781450306195 1450306195 http://hdl.handle.net/1721.1/79401 Mannor, Shie and John Tsitsiklis. "Mean-Variance Optimization in Markov Decision Processes ." in Twenty-Eighth International Conference on Machine Learning, ICML 2011, Jun. 28-Jul.2, Bellevue, Washington. 2011. https://orcid.org/0000-0003-2658-8239 en_US http://www.icml-2011.org/papers.php Proceedings of the Twenty-Eighth International Conference on Machine Learning, ICML 2011 Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf International Machine Learning Society Tsitsiklis via Amy Stout |
spellingShingle | Mannor, Shie Tsitsiklis, John N. Mean-Variance Optimization in Markov Decision Processes |
title | Mean-Variance Optimization in Markov Decision Processes |
title_full | Mean-Variance Optimization in Markov Decision Processes |
title_fullStr | Mean-Variance Optimization in Markov Decision Processes |
title_full_unstemmed | Mean-Variance Optimization in Markov Decision Processes |
title_short | Mean-Variance Optimization in Markov Decision Processes |
title_sort | mean variance optimization in markov decision processes |
url | http://hdl.handle.net/1721.1/79401 https://orcid.org/0000-0003-2658-8239 |
work_keys_str_mv | AT mannorshie meanvarianceoptimizationinmarkovdecisionprocesses AT tsitsiklisjohnn meanvarianceoptimizationinmarkovdecisionprocesses |