Learning to Plan by Learning Rules

Many environments involve following rules and tasks; for example, a chef cooking a dish follows a recipe, and a person driving follows rules of the road. People are naturally fluent with rules: we can learn rules efficiently; we can follow rules; we can interpret rules and explain them to others; an...

Full description

Bibliographic Details
Main Author:	Araki, Minoru Brandon
Other Authors:	Rus, Daniela
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/139998

_version_	1826189404250570752
author	Araki, Minoru Brandon
author2	Rus, Daniela
author_facet	Rus, Daniela Araki, Minoru Brandon
author_sort	Araki, Minoru Brandon
collection	MIT
description	Many environments involve following rules and tasks; for example, a chef cooking a dish follows a recipe, and a person driving follows rules of the road. People are naturally fluent with rules: we can learn rules efficiently; we can follow rules; we can interpret rules and explain them to others; and we can rapidly adjust to modified rules such as a new recipe without needing to relearn everything from scratch. By contrast, deep reinforcement learning (DRL) algorithms are ill-suited to learning policies in rule-based environments, as satisfying rules often involves executing lengthy tasks with sparse rewards. Furthermore, learned DRL policies are difficult if not impossible to interpret and are not composable. The aim of this thesis is to develop a reinforcement learning framework for rule-based environments that can efficiently learn policies that are interpretable, satisfying, and composable. We achieve interpretability by representing rules as automata or Linear Temporal Logic (LTL) formulas in a hierarchical Markov Decision Process (MDP). We achieve satisfaction by planning over the hierarchical MDP using a modified version of value iteration. We achieve composability by building off of a hierarchical reinforcement learning (HRL) framework called the options framework, in which low-level options can be composed arbitrarily. And lastly, we achieve data-efficient learning by integrating our HRL framework into a Bayesian model that can infer a distribution over LTL formulas given a low-level environment and a set of expert trajectories. We demonstrate the effectiveness of our approach via a number of rule-learning and planning experiments in both simulated and real-world environments.
first_indexed	2024-09-23T08:14:12Z
format	Thesis
id	mit-1721.1/139998
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T08:14:12Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1399982022-02-08T04:06:01Z Learning to Plan by Learning Rules Araki, Minoru Brandon Rus, Daniela Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Many environments involve following rules and tasks; for example, a chef cooking a dish follows a recipe, and a person driving follows rules of the road. People are naturally fluent with rules: we can learn rules efficiently; we can follow rules; we can interpret rules and explain them to others; and we can rapidly adjust to modified rules such as a new recipe without needing to relearn everything from scratch. By contrast, deep reinforcement learning (DRL) algorithms are ill-suited to learning policies in rule-based environments, as satisfying rules often involves executing lengthy tasks with sparse rewards. Furthermore, learned DRL policies are difficult if not impossible to interpret and are not composable. The aim of this thesis is to develop a reinforcement learning framework for rule-based environments that can efficiently learn policies that are interpretable, satisfying, and composable. We achieve interpretability by representing rules as automata or Linear Temporal Logic (LTL) formulas in a hierarchical Markov Decision Process (MDP). We achieve satisfaction by planning over the hierarchical MDP using a modified version of value iteration. We achieve composability by building off of a hierarchical reinforcement learning (HRL) framework called the options framework, in which low-level options can be composed arbitrarily. And lastly, we achieve data-efficient learning by integrating our HRL framework into a Bayesian model that can infer a distribution over LTL formulas given a low-level environment and a set of expert trajectories. We demonstrate the effectiveness of our approach via a number of rule-learning and planning experiments in both simulated and real-world environments. Ph.D. 2022-02-07T15:18:00Z 2022-02-07T15:18:00Z 2021-09 2021-09-21T19:30:47.529Z Thesis https://hdl.handle.net/1721.1/139998 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Araki, Minoru Brandon Learning to Plan by Learning Rules
title	Learning to Plan by Learning Rules
title_full	Learning to Plan by Learning Rules
title_fullStr	Learning to Plan by Learning Rules
title_full_unstemmed	Learning to Plan by Learning Rules
title_short	Learning to Plan by Learning Rules
title_sort	learning to plan by learning rules
url	https://hdl.handle.net/1721.1/139998
work_keys_str_mv	AT arakiminorubrandon learningtoplanbylearningrules

Learning to Plan by Learning Rules

Similar Items