Methods for autonomously decomposing and performing long-horizon sequential decision tasks

Sequential decision-making over long timescales and in complex task environments is an important problem in Artificial Intelligence (AI). An effective approach to tackle this problem is to autonomously decompose a long-horizon task into a sequence of simpler subtasks or subgoals. We refer to this ap...

Full description

Bibliographic Details
Main Author:	Pateria, Shubham
Other Authors:	Quek Hiok Chai
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/155182

_version_	1826127122602655744
author	Pateria, Shubham
author2	Quek Hiok Chai
author_facet	Quek Hiok Chai Pateria, Shubham
author_sort	Pateria, Shubham
collection	NTU
description	Sequential decision-making over long timescales and in complex task environments is an important problem in Artificial Intelligence (AI). An effective approach to tackle this problem is to autonomously decompose a long-horizon task into a sequence of simpler subtasks or subgoals. We refer to this approach as Autonomous Task Decomposition (ATD) in the thesis and study it for multi-agent coordination using model-free Hierarchical Reinforcement Learning (HRL), single-agent goal-reaching using model-free HRL, and single-agent goal-reaching using model-based planning. The objective of the thesis is to develop novel methods to address three important challenges related to ATD, which are as follows: 1. Effective multi-agent HRL under sparse global rewards and complex inter-dependencies among agents. 2. Efficient unification of autonomous subgoal discovery and single-agent HRL without slow learning. 3. Learning models for planning-based ATD that produce more rewarding and feasible plans. In this regard, the thesis introduces three novel ATD methods as follows: 1. Inter Subtask Empowerment based Multi-agent Options (ISEMO) is introduced for effective multi-agent HRL by using auxiliary rewards that capture the inter-dependencies among HRL agents and their (handcrafted) subtasks. ISEMO leads to better coordinated performance of the inter-dependent agents on a complex Search & Rescue task, compared to a standard multi-agent HRL method. 2. End-to-End Hierarchical Reinforcement Learning with Integrated Discovery of Salient Subgoals (LIDOSS) is introduced for efficient unification of subgoal discovery and HRL for single-agent goal-reaching, by using a probability-based subgoal discovery heuristic integrated with the subgoal selection policy. LIDOSS accelerates end-to-end learning and leads to higher goal-reaching success rates compared to a state-of-the-art HRL method. 3. Finally, Learning Subgoal Graph using Value-based Subgoal Discovery and Automatic Pruning (LSGVP) is introduced to learn subgoal graph-based planning models that produce more rewarding and feasible plans for single-agent goal-reaching. LSGVP uses cumulative reward-based subgoal discovery and automatic pruning of erroneous connections in the subgoal graph. It achieves higher positive cumulative rewards and higher success rates compared to other state-of-the-art subgoal graph-based planning methods, while also being more data-efficient than model-free HRL.
first_indexed	2024-10-01T07:03:35Z
format	Thesis-Doctor of Philosophy
id	ntu-10356/155182
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T07:03:35Z
publishDate	2022
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1551822022-03-06T05:18:16Z Methods for autonomously decomposing and performing long-horizon sequential decision tasks Pateria, Shubham Quek Hiok Chai School of Computer Science and Engineering ASHCQUEK@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Sequential decision-making over long timescales and in complex task environments is an important problem in Artificial Intelligence (AI). An effective approach to tackle this problem is to autonomously decompose a long-horizon task into a sequence of simpler subtasks or subgoals. We refer to this approach as Autonomous Task Decomposition (ATD) in the thesis and study it for multi-agent coordination using model-free Hierarchical Reinforcement Learning (HRL), single-agent goal-reaching using model-free HRL, and single-agent goal-reaching using model-based planning. The objective of the thesis is to develop novel methods to address three important challenges related to ATD, which are as follows: 1. Effective multi-agent HRL under sparse global rewards and complex inter-dependencies among agents. 2. Efficient unification of autonomous subgoal discovery and single-agent HRL without slow learning. 3. Learning models for planning-based ATD that produce more rewarding and feasible plans. In this regard, the thesis introduces three novel ATD methods as follows: 1. Inter Subtask Empowerment based Multi-agent Options (ISEMO) is introduced for effective multi-agent HRL by using auxiliary rewards that capture the inter-dependencies among HRL agents and their (handcrafted) subtasks. ISEMO leads to better coordinated performance of the inter-dependent agents on a complex Search & Rescue task, compared to a standard multi-agent HRL method. 2. End-to-End Hierarchical Reinforcement Learning with Integrated Discovery of Salient Subgoals (LIDOSS) is introduced for efficient unification of subgoal discovery and HRL for single-agent goal-reaching, by using a probability-based subgoal discovery heuristic integrated with the subgoal selection policy. LIDOSS accelerates end-to-end learning and leads to higher goal-reaching success rates compared to a state-of-the-art HRL method. 3. Finally, Learning Subgoal Graph using Value-based Subgoal Discovery and Automatic Pruning (LSGVP) is introduced to learn subgoal graph-based planning models that produce more rewarding and feasible plans for single-agent goal-reaching. LSGVP uses cumulative reward-based subgoal discovery and automatic pruning of erroneous connections in the subgoal graph. It achieves higher positive cumulative rewards and higher success rates compared to other state-of-the-art subgoal graph-based planning methods, while also being more data-efficient than model-free HRL. Doctor of Philosophy 2022-02-11T01:29:06Z 2022-02-11T01:29:06Z 2022 Thesis-Doctor of Philosophy Pateria, S. (2022). Methods for autonomously decomposing and performing long-horizon sequential decision tasks. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155182 https://hdl.handle.net/10356/155182 10.32657/10356/155182 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Pateria, Shubham Methods for autonomously decomposing and performing long-horizon sequential decision tasks
title	Methods for autonomously decomposing and performing long-horizon sequential decision tasks
title_full	Methods for autonomously decomposing and performing long-horizon sequential decision tasks
title_fullStr	Methods for autonomously decomposing and performing long-horizon sequential decision tasks
title_full_unstemmed	Methods for autonomously decomposing and performing long-horizon sequential decision tasks
title_short	Methods for autonomously decomposing and performing long-horizon sequential decision tasks
title_sort	methods for autonomously decomposing and performing long horizon sequential decision tasks
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
url	https://hdl.handle.net/10356/155182
work_keys_str_mv	AT pateriashubham methodsforautonomouslydecomposingandperforminglonghorizonsequentialdecisiontasks

Methods for autonomously decomposing and performing long-horizon sequential decision tasks

Similar Items