Monte Carlo tree search with Boltzmann exploration

Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporate...

Full description

Bibliographic Details
Main Authors: Painter, M, Baioumy, M, Hawes, N, Lacerda, B
Format: Conference item
Language:English
Published: Curran Associates 2023
_version_ 1826313212587409408
author Painter, M
Baioumy, M
Hawes, N
Lacerda, B
author_facet Painter, M
Baioumy, M
Hawes, N
Lacerda, B
author_sort Painter, M
collection OXFORD
description Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the maximum entropy principle into an MCTS approach, utilising Boltzmann policies to sample actions, naturally encouraging more exploration. In this paper, we highlight a major limitation of MENTS: optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective. We introduce two algorithms, Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS), that address these limitations and preserve the benefits of Boltzmann policies, such as allowing actions to be sampled faster by using the Alias method. Our empirical analysis shows that our algorithms show consistent high performance across several benchmark domains, including the game of Go.
first_indexed 2024-09-25T04:09:33Z
format Conference item
id oxford-uuid:d40e0d03-0e74-444b-883f-056533c4ce49
institution University of Oxford
language English
last_indexed 2024-09-25T04:09:33Z
publishDate 2023
publisher Curran Associates
record_format dspace
spelling oxford-uuid:d40e0d03-0e74-444b-883f-056533c4ce492024-06-13T16:02:11ZMonte Carlo tree search with Boltzmann explorationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:d40e0d03-0e74-444b-883f-056533c4ce49EnglishSymplectic ElementsCurran Associates2023Painter, MBaioumy, MHawes, NLacerda, BMonte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the maximum entropy principle into an MCTS approach, utilising Boltzmann policies to sample actions, naturally encouraging more exploration. In this paper, we highlight a major limitation of MENTS: optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective. We introduce two algorithms, Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS), that address these limitations and preserve the benefits of Boltzmann policies, such as allowing actions to be sampled faster by using the Alias method. Our empirical analysis shows that our algorithms show consistent high performance across several benchmark domains, including the game of Go.
spellingShingle Painter, M
Baioumy, M
Hawes, N
Lacerda, B
Monte Carlo tree search with Boltzmann exploration
title Monte Carlo tree search with Boltzmann exploration
title_full Monte Carlo tree search with Boltzmann exploration
title_fullStr Monte Carlo tree search with Boltzmann exploration
title_full_unstemmed Monte Carlo tree search with Boltzmann exploration
title_short Monte Carlo tree search with Boltzmann exploration
title_sort monte carlo tree search with boltzmann exploration
work_keys_str_mv AT painterm montecarlotreesearchwithboltzmannexploration
AT baioumym montecarlotreesearchwithboltzmannexploration
AT hawesn montecarlotreesearchwithboltzmannexploration
AT lacerdab montecarlotreesearchwithboltzmannexploration