Stick-breaking policy learning in Dec-POMDPs
Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper repres...
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
International Joint Conferences on Artificial Intelligence, Inc.
2016
|
Online Access: | http://hdl.handle.net/1721.1/104918 https://orcid.org/0000-0002-1648-8325 https://orcid.org/0000-0001-8576-1930 |
Summary: | Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper represents the local policy of each agent using variable-sized FSCs that are constructed usinga stick-breaking prior, leading to a new framework called decentralized stick-breaking policy representation (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the DecPOMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods. |
---|