Stop! planner time: metareasoning for probabilistic planning using learned performance profiles
The metareasoning framework aims to enable autonomous agents to factor in planning costs when making decisions. In this work, we develop the first non-myopic metareasoning algorithm for planning with Markov decision processes. Our method learns the behaviour of anytime probabilistic planning algorit...
Main Authors: | , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
Association for the Advancement of Artificial Intelligence
2024
|
_version_ | 1811140933083725824 |
---|---|
author | Budd, M Lacerda, B Hawes, N |
author_facet | Budd, M Lacerda, B Hawes, N |
author_sort | Budd, M |
collection | OXFORD |
description | The metareasoning framework aims to enable autonomous
agents to factor in planning costs when making decisions. In
this work, we develop the first non-myopic metareasoning
algorithm for planning with Markov decision processes. Our
method learns the behaviour of anytime probabilistic planning
algorithms from performance data. Specifically, we propose
a novel model for metareasoning, based on contextual performance profiles that predict the value of the planner’s current
solution given the time spent planning, the state of the planning algorithm’s internal parameters, and the difficulty of the
planning problem being solved. This model removes the need
to assume that the current solution quality is always known,
broadening the class of metareasoning problems that can be
addressed. We then employ deep reinforcement learning to
learn a policy that decides, at each timestep, whether to continue planning or start executing the current plan, and how to
set hyperparameters of the planner to enhance its performance.
We demonstrate our algorithm’s ability to perform effective
metareasoning in two domains. |
first_indexed | 2024-03-07T08:29:11Z |
format | Conference item |
id | oxford-uuid:71b8f60e-519f-4f5f-b8bb-ae18a67469e7 |
institution | University of Oxford |
language | English |
last_indexed | 2024-09-25T04:29:51Z |
publishDate | 2024 |
publisher | Association for the Advancement of Artificial Intelligence |
record_format | dspace |
spelling | oxford-uuid:71b8f60e-519f-4f5f-b8bb-ae18a67469e72024-08-27T10:06:22ZStop! planner time: metareasoning for probabilistic planning using learned performance profilesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:71b8f60e-519f-4f5f-b8bb-ae18a67469e7EnglishSymplectic ElementsAssociation for the Advancement of Artificial Intelligence2024Budd, MLacerda, BHawes, NThe metareasoning framework aims to enable autonomous agents to factor in planning costs when making decisions. In this work, we develop the first non-myopic metareasoning algorithm for planning with Markov decision processes. Our method learns the behaviour of anytime probabilistic planning algorithms from performance data. Specifically, we propose a novel model for metareasoning, based on contextual performance profiles that predict the value of the planner’s current solution given the time spent planning, the state of the planning algorithm’s internal parameters, and the difficulty of the planning problem being solved. This model removes the need to assume that the current solution quality is always known, broadening the class of metareasoning problems that can be addressed. We then employ deep reinforcement learning to learn a policy that decides, at each timestep, whether to continue planning or start executing the current plan, and how to set hyperparameters of the planner to enhance its performance. We demonstrate our algorithm’s ability to perform effective metareasoning in two domains. |
spellingShingle | Budd, M Lacerda, B Hawes, N Stop! planner time: metareasoning for probabilistic planning using learned performance profiles |
title | Stop! planner time: metareasoning for probabilistic planning using learned performance profiles |
title_full | Stop! planner time: metareasoning for probabilistic planning using learned performance profiles |
title_fullStr | Stop! planner time: metareasoning for probabilistic planning using learned performance profiles |
title_full_unstemmed | Stop! planner time: metareasoning for probabilistic planning using learned performance profiles |
title_short | Stop! planner time: metareasoning for probabilistic planning using learned performance profiles |
title_sort | stop planner time metareasoning for probabilistic planning using learned performance profiles |
work_keys_str_mv | AT buddm stopplannertimemetareasoningforprobabilisticplanningusinglearnedperformanceprofiles AT lacerdab stopplannertimemetareasoningforprobabilisticplanningusinglearnedperformanceprofiles AT hawesn stopplannertimemetareasoningforprobabilisticplanningusinglearnedperformanceprofiles |