Stop! planner time: metareasoning for probabilistic planning using learned performance profiles

The metareasoning framework aims to enable autonomous agents to factor in planning costs when making decisions. In this work, we develop the first non-myopic metareasoning algorithm for planning with Markov decision processes. Our method learns the behaviour of anytime probabilistic planning algorit...

Full description

Bibliographic Details
Main Authors:	Budd, M, Lacerda, B, Hawes, N
Format:	Conference item
Language:	English
Published:	Association for the Advancement of Artificial Intelligence 2024

_version_	1811140933083725824
author	Budd, M Lacerda, B Hawes, N
author_facet	Budd, M Lacerda, B Hawes, N
author_sort	Budd, M
collection	OXFORD
description	The metareasoning framework aims to enable autonomous agents to factor in planning costs when making decisions. In this work, we develop the first non-myopic metareasoning algorithm for planning with Markov decision processes. Our method learns the behaviour of anytime probabilistic planning algorithms from performance data. Specifically, we propose a novel model for metareasoning, based on contextual performance profiles that predict the value of the planner’s current solution given the time spent planning, the state of the planning algorithm’s internal parameters, and the difficulty of the planning problem being solved. This model removes the need to assume that the current solution quality is always known, broadening the class of metareasoning problems that can be addressed. We then employ deep reinforcement learning to learn a policy that decides, at each timestep, whether to continue planning or start executing the current plan, and how to set hyperparameters of the planner to enhance its performance. We demonstrate our algorithm’s ability to perform effective metareasoning in two domains.
first_indexed	2024-03-07T08:29:11Z
format	Conference item
id	oxford-uuid:71b8f60e-519f-4f5f-b8bb-ae18a67469e7
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:29:51Z
publishDate	2024
publisher	Association for the Advancement of Artificial Intelligence
record_format	dspace
spelling	oxford-uuid:71b8f60e-519f-4f5f-b8bb-ae18a67469e72024-08-27T10:06:22ZStop! planner time: metareasoning for probabilistic planning using learned performance profilesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:71b8f60e-519f-4f5f-b8bb-ae18a67469e7EnglishSymplectic ElementsAssociation for the Advancement of Artificial Intelligence2024Budd, MLacerda, BHawes, NThe metareasoning framework aims to enable autonomous agents to factor in planning costs when making decisions. In this work, we develop the first non-myopic metareasoning algorithm for planning with Markov decision processes. Our method learns the behaviour of anytime probabilistic planning algorithms from performance data. Specifically, we propose a novel model for metareasoning, based on contextual performance profiles that predict the value of the planner’s current solution given the time spent planning, the state of the planning algorithm’s internal parameters, and the difficulty of the planning problem being solved. This model removes the need to assume that the current solution quality is always known, broadening the class of metareasoning problems that can be addressed. We then employ deep reinforcement learning to learn a policy that decides, at each timestep, whether to continue planning or start executing the current plan, and how to set hyperparameters of the planner to enhance its performance. We demonstrate our algorithm’s ability to perform effective metareasoning in two domains.
spellingShingle	Budd, M Lacerda, B Hawes, N Stop! planner time: metareasoning for probabilistic planning using learned performance profiles
title	Stop! planner time: metareasoning for probabilistic planning using learned performance profiles
title_full	Stop! planner time: metareasoning for probabilistic planning using learned performance profiles
title_fullStr	Stop! planner time: metareasoning for probabilistic planning using learned performance profiles
title_full_unstemmed	Stop! planner time: metareasoning for probabilistic planning using learned performance profiles
title_short	Stop! planner time: metareasoning for probabilistic planning using learned performance profiles
title_sort	stop planner time metareasoning for probabilistic planning using learned performance profiles
work_keys_str_mv	AT buddm stopplannertimemetareasoningforprobabilisticplanningusinglearnedperformanceprofiles AT lacerdab stopplannertimemetareasoningforprobabilisticplanningusinglearnedperformanceprofiles AT hawesn stopplannertimemetareasoningforprobabilisticplanningusinglearnedperformanceprofiles

Stop! planner time: metareasoning for probabilistic planning using learned performance profiles

Similar Items