Stick-breaking policy learning in Dec-POMDPs

Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper repres...

Full description

Bibliographic Details
Main Authors:	Amato, Christopher, Liao, Xuejun, Carin, Lawrence, Liu, Miao, How, Jonathan P
Other Authors:	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
Format:	Article
Language:	en_US
Published:	International Joint Conferences on Artificial Intelligence, Inc. 2016
Online Access:	http://hdl.handle.net/1721.1/104918 https://orcid.org/0000-0002-1648-8325 https://orcid.org/0000-0001-8576-1930

Internet

http://hdl.handle.net/1721.1/104918
https://orcid.org/0000-0002-1648-8325
https://orcid.org/0000-0001-8576-1930

Stick-breaking policy learning in Dec-POMDPs

Internet

Similar Items