Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions

This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under...

Full description

Bibliographic Details
Main Authors:	Amato, Christopher, Liu, Miao, Sivakumar, Kavinayan P, Omidshafiei, Shayegan, How, Jonathan P
Other Authors:	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
Format:	Article
Published:	Institute of Electrical and Electronics Engineers (IEEE) 2018
Online Access:	http://hdl.handle.net/1721.1/114739 https://orcid.org/0000-0002-1648-8325 https://orcid.org/0000-0003-0903-0137 https://orcid.org/0000-0001-8576-1930

_version_	1826208757514764288
author	Amato, Christopher Liu, Miao Sivakumar, Kavinayan P Omidshafiei, Shayegan How, Jonathan P
author2	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
author_facet	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics Amato, Christopher Liu, Miao Sivakumar, Kavinayan P Omidshafiei, Shayegan How, Jonathan P
author_sort	Amato, Christopher
collection	MIT
description	This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under uncertainty and MAs allow temporally extended and asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. Previous methods which aim to address these issues suffer from local optimality and sensitivity to initial conditions. Additionally, few hardware demonstrations involving a large team of heterogeneous robots and with long planning horizons exist. This work addresses these gaps by proposing an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards. Our experiments show the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods. We implement two variants of multi-robot Search and Rescue (SAR) domains (with and without obstacles) on hardware to demonstrate the learned policies can effectively control a team of distributed robots to cooperate in a partially observable stochastic environment.
first_indexed	2024-09-23T14:12:10Z
format	Article
id	mit-1721.1/114739
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T14:12:10Z
publishDate	2018
publisher	Institute of Electrical and Electronics Engineers (IEEE)
record_format	dspace
spelling	mit-1721.1/1147392022-09-28T19:08:19Z Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions Amato, Christopher Liu, Miao Sivakumar, Kavinayan P Omidshafiei, Shayegan How, Jonathan P Massachusetts Institute of Technology. Department of Aeronautics and Astronautics Massachusetts Institute of Technology. Department of Mechanical Engineering Massachusetts Institute of Technology. Laboratory for Information and Decision Systems Liu, Miao Sivakumar, Kavinayan P Omidshafiei, Shayegan How, Jonathan P This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under uncertainty and MAs allow temporally extended and asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. Previous methods which aim to address these issues suffer from local optimality and sensitivity to initial conditions. Additionally, few hardware demonstrations involving a large team of heterogeneous robots and with long planning horizons exist. This work addresses these gaps by proposing an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards. Our experiments show the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods. We implement two variants of multi-robot Search and Rescue (SAR) domains (with and without obstacles) on hardware to demonstrate the learned policies can effectively control a team of distributed robots to cooperate in a partially observable stochastic environment. 2018-04-13T22:28:08Z 2018-04-13T22:28:08Z 2017-12 2017-09 2018-03-21T16:14:11Z Article http://purl.org/eprint/type/ConferencePaper 978-1-5386-2682-5 978-1-5386-2681-8 978-1-5386-2683-2 2153-0866 http://hdl.handle.net/1721.1/114739 Liu, Miao, Kavinayan Sivakumar, Shayegan Omidshafiei, Christopher Amato, and Jonathan P. How. “Learning for Multi-Robot Cooperation in Partially Observable Stochastic Environments with Macro-Actions.” 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2017, Vancouver, BC, Canada, 2017. https://orcid.org/0000-0002-1648-8325 https://orcid.org/0000-0003-0903-0137 https://orcid.org/0000-0001-8576-1930 http://dx.doi.org/10.1109/IROS.2017.8206001 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) arXiv
spellingShingle	Amato, Christopher Liu, Miao Sivakumar, Kavinayan P Omidshafiei, Shayegan How, Jonathan P Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions
title	Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions
title_full	Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions
title_fullStr	Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions
title_full_unstemmed	Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions
title_short	Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions
title_sort	learning for multi robot cooperation in partially observable stochastic environments with macro actions
url	http://hdl.handle.net/1721.1/114739 https://orcid.org/0000-0002-1648-8325 https://orcid.org/0000-0003-0903-0137 https://orcid.org/0000-0001-8576-1930
work_keys_str_mv	AT amatochristopher learningformultirobotcooperationinpartiallyobservablestochasticenvironmentswithmacroactions AT liumiao learningformultirobotcooperationinpartiallyobservablestochasticenvironmentswithmacroactions AT sivakumarkavinayanp learningformultirobotcooperationinpartiallyobservablestochasticenvironmentswithmacroactions AT omidshafieishayegan learningformultirobotcooperationinpartiallyobservablestochasticenvironmentswithmacroactions AT howjonathanp learningformultirobotcooperationinpartiallyobservablestochasticenvironmentswithmacroactions

Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions

Similar Items