Stick-breaking policy learning in Dec-POMDPs

Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper repres...

Full description

Bibliographic Details
Main Authors:	Amato, Christopher, Liao, Xuejun, Carin, Lawrence, Liu, Miao, How, Jonathan P
Other Authors:	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
Format:	Article
Language:	en_US
Published:	International Joint Conferences on Artificial Intelligence, Inc. 2016
Online Access:	http://hdl.handle.net/1721.1/104918 https://orcid.org/0000-0002-1648-8325 https://orcid.org/0000-0001-8576-1930

_version_	1826192156533981184
author	Amato, Christopher Liao, Xuejun Carin, Lawrence Liu, Miao How, Jonathan P
author2	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
author_facet	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics Amato, Christopher Liao, Xuejun Carin, Lawrence Liu, Miao How, Jonathan P
author_sort	Amato, Christopher
collection	MIT
description	Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper represents the local policy of each agent using variable-sized FSCs that are constructed usinga stick-breaking prior, leading to a new framework called decentralized stick-breaking policy representation (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the DecPOMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.
first_indexed	2024-09-23T09:07:06Z
format	Article
id	mit-1721.1/104918
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T09:07:06Z
publishDate	2016
publisher	International Joint Conferences on Artificial Intelligence, Inc.
record_format	dspace
spelling	mit-1721.1/1049182022-09-30T13:33:23Z Stick-breaking policy learning in Dec-POMDPs Amato, Christopher Liao, Xuejun Carin, Lawrence Liu, Miao How, Jonathan P Massachusetts Institute of Technology. Department of Aeronautics and Astronautics Massachusetts Institute of Technology. Laboratory for Information and Decision Systems Liu, Miao How, Jonathan P Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper represents the local policy of each agent using variable-sized FSCs that are constructed usinga stick-breaking prior, leading to a new framework called decentralized stick-breaking policy representation (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the DecPOMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods. United States. Office of Naval Research. Multidisciplinary University Research Initiative (Award N000141110688) National Science Foundation (U.S.) (Award 1463945) 2016-10-21T19:07:30Z 2016-10-21T19:07:30Z 2015-07 Article http://purl.org/eprint/type/ConferencePaper http://hdl.handle.net/1721.1/104918 Liu, Miao et al. "Stick-Breaking Policy Learning in Dec-POMDPs." International Joint Conference on Artificial Intelligence, July 25-31, 2015, Buenos Aires, Argentina. https://orcid.org/0000-0002-1648-8325 https://orcid.org/0000-0001-8576-1930 en_US http://ijcai-15.org/index.php/accepted-papers International Joint Conference on Artificial Intelligence Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf International Joint Conferences on Artificial Intelligence, Inc. MIT web domain
spellingShingle	Amato, Christopher Liao, Xuejun Carin, Lawrence Liu, Miao How, Jonathan P Stick-breaking policy learning in Dec-POMDPs
title	Stick-breaking policy learning in Dec-POMDPs
title_full	Stick-breaking policy learning in Dec-POMDPs
title_fullStr	Stick-breaking policy learning in Dec-POMDPs
title_full_unstemmed	Stick-breaking policy learning in Dec-POMDPs
title_short	Stick-breaking policy learning in Dec-POMDPs
title_sort	stick breaking policy learning in dec pomdps
url	http://hdl.handle.net/1721.1/104918 https://orcid.org/0000-0002-1648-8325 https://orcid.org/0000-0001-8576-1930
work_keys_str_mv	AT amatochristopher stickbreakingpolicylearningindecpomdps AT liaoxuejun stickbreakingpolicylearningindecpomdps AT carinlawrence stickbreakingpolicylearningindecpomdps AT liumiao stickbreakingpolicylearningindecpomdps AT howjonathanp stickbreakingpolicylearningindecpomdps

Stick-breaking policy learning in Dec-POMDPs

Similar Items