Bayesian Nonparametric Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given the transition function and a set of observed demonstrations in the form of state-action pairs. Current IRL algorithms attempt to find a single reward function which explains the...

Full description

Bibliographic Details
Main Authors:	How, Jonathan P., Michini, Bernard J.
Other Authors:	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
Format:	Article
Language:	en_US
Published:	Springer-Verlag 2013
Online Access:	http://hdl.handle.net/1721.1/81484 https://orcid.org/0000-0001-8576-1930

_version_	1826197255056523264
author	How, Jonathan P. Michini, Bernard J.
author2	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
author_facet	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics How, Jonathan P. Michini, Bernard J.
author_sort	How, Jonathan P.
collection	MIT
description	Inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given the transition function and a set of observed demonstrations in the form of state-action pairs. Current IRL algorithms attempt to find a single reward function which explains the entire observation set. In practice, this leads to a computationally-costly search over a large (typically infinite) space of complex reward functions. This paper proposes the notion that if the observations can be partitioned into smaller groups, a class of much simpler reward functions can be used to explain each group. The proposed method uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Experimental results are given for simple examples showing comparable performance to other IRL algorithms in nominal situations. Moreover, the proposed method handles cyclic tasks (where the agent begins and ends in the same state) that would break existing algorithms without modification. Finally, the new algorithm has a fundamentally different structure than previous methods, making it more computationally efficient in a real-world learning scenario where the state space is large but the demonstration set is small.
first_indexed	2024-09-23T10:44:53Z
format	Article
id	mit-1721.1/81484
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T10:44:53Z
publishDate	2013
publisher	Springer-Verlag
record_format	dspace
spelling	mit-1721.1/814842022-09-30T22:42:20Z Bayesian Nonparametric Inverse Reinforcement Learning How, Jonathan P. Michini, Bernard J. Massachusetts Institute of Technology. Department of Aeronautics and Astronautics Michini, Bernard J. How, Jonathan P. Inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given the transition function and a set of observed demonstrations in the form of state-action pairs. Current IRL algorithms attempt to find a single reward function which explains the entire observation set. In practice, this leads to a computationally-costly search over a large (typically infinite) space of complex reward functions. This paper proposes the notion that if the observations can be partitioned into smaller groups, a class of much simpler reward functions can be used to explain each group. The proposed method uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Experimental results are given for simple examples showing comparable performance to other IRL algorithms in nominal situations. Moreover, the proposed method handles cyclic tasks (where the agent begins and ends in the same state) that would break existing algorithms without modification. Finally, the new algorithm has a fundamentally different structure than previous methods, making it more computationally efficient in a real-world learning scenario where the state space is large but the demonstration set is small. 2013-10-23T16:12:50Z 2013-10-23T16:12:50Z 2012-09 Article http://purl.org/eprint/type/ConferencePaper 978-3-642-33485-6 978-3-642-33486-3 0302-9743 1611-3349 http://hdl.handle.net/1721.1/81484 Michini, Bernard, and Jonathan P. How. Bayesian Nonparametric Inverse Reinforcement Learning. Springer-Verlag, 2012. https://orcid.org/0000-0001-8576-1930 en_US http://dx.doi.org/10.1007/978-3-642-33486-3_10 Machine Learning and Knowledge Discovery in Databases Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Springer-Verlag Other University Web Domain
spellingShingle	How, Jonathan P. Michini, Bernard J. Bayesian Nonparametric Inverse Reinforcement Learning
title	Bayesian Nonparametric Inverse Reinforcement Learning
title_full	Bayesian Nonparametric Inverse Reinforcement Learning
title_fullStr	Bayesian Nonparametric Inverse Reinforcement Learning
title_full_unstemmed	Bayesian Nonparametric Inverse Reinforcement Learning
title_short	Bayesian Nonparametric Inverse Reinforcement Learning
title_sort	bayesian nonparametric inverse reinforcement learning
url	http://hdl.handle.net/1721.1/81484 https://orcid.org/0000-0001-8576-1930
work_keys_str_mv	AT howjonathanp bayesiannonparametricinversereinforcementlearning AT michinibernardj bayesiannonparametricinversereinforcementlearning

Bayesian Nonparametric Inverse Reinforcement Learning

Similar Items