Exploratory combinatorial optimization with reinforcement learning

Many real-world problems can be reduced to combinatorial optimization on a graph, where the subset or ordering of vertices that maximize some objective function must be found. With such tasks often NP-hard and analytically intractable, reinforcement learning (RL) has shown promise as a framework wit...

Full description

Bibliographic Details
Main Authors: Barrett, TD, Clements, WR, Foerster, JN, Lvovsky, AI
Format: Conference item
Language:English
Published: Association for the Advancement of Artificial Intelligence 2020
_version_ 1826263037294673920
author Barrett, TD
Clements, WR
Foerster, JN
Lvovsky, AI
author_facet Barrett, TD
Clements, WR
Foerster, JN
Lvovsky, AI
author_sort Barrett, TD
collection OXFORD
description Many real-world problems can be reduced to combinatorial optimization on a graph, where the subset or ordering of vertices that maximize some objective function must be found. With such tasks often NP-hard and analytically intractable, reinforcement learning (RL) has shown promise as a framework with which efficient heuristic methods to tackle these problems can be learned. Previous works construct the solution subset incrementally, adding one element at a time, however, the irreversible nature of this approach prevents the agent from revising its earlier decisions, which may be necessary given the complexity of the optimization task. We instead propose that the agent should seek to continuously improve the solution by learning to explore at test time. Our approach of exploratory combinatorial optimization (ECO-DQN) is, in principle, applicable to any combinatorial problem that can be defined on a graph. Experimentally, we show our method to produce state-of-the-art RL performance on the Maximum Cut problem. Moreover, because ECO-DQN can start from any arbitrary configuration, it can be combined with other search methods to further improve performance, which we demonstrate using a simple random search.
first_indexed 2024-03-06T19:45:20Z
format Conference item
id oxford-uuid:221b3c94-e9c1-40c3-934a-236f259741d0
institution University of Oxford
language English
last_indexed 2024-03-06T19:45:20Z
publishDate 2020
publisher Association for the Advancement of Artificial Intelligence
record_format dspace
spelling oxford-uuid:221b3c94-e9c1-40c3-934a-236f259741d02022-03-26T11:36:59ZExploratory combinatorial optimization with reinforcement learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:221b3c94-e9c1-40c3-934a-236f259741d0EnglishSymplectic Elements at OxfordAssociation for the Advancement of Artificial Intelligence2020Barrett, TDClements, WRFoerster, JNLvovsky, AIMany real-world problems can be reduced to combinatorial optimization on a graph, where the subset or ordering of vertices that maximize some objective function must be found. With such tasks often NP-hard and analytically intractable, reinforcement learning (RL) has shown promise as a framework with which efficient heuristic methods to tackle these problems can be learned. Previous works construct the solution subset incrementally, adding one element at a time, however, the irreversible nature of this approach prevents the agent from revising its earlier decisions, which may be necessary given the complexity of the optimization task. We instead propose that the agent should seek to continuously improve the solution by learning to explore at test time. Our approach of exploratory combinatorial optimization (ECO-DQN) is, in principle, applicable to any combinatorial problem that can be defined on a graph. Experimentally, we show our method to produce state-of-the-art RL performance on the Maximum Cut problem. Moreover, because ECO-DQN can start from any arbitrary configuration, it can be combined with other search methods to further improve performance, which we demonstrate using a simple random search.
spellingShingle Barrett, TD
Clements, WR
Foerster, JN
Lvovsky, AI
Exploratory combinatorial optimization with reinforcement learning
title Exploratory combinatorial optimization with reinforcement learning
title_full Exploratory combinatorial optimization with reinforcement learning
title_fullStr Exploratory combinatorial optimization with reinforcement learning
title_full_unstemmed Exploratory combinatorial optimization with reinforcement learning
title_short Exploratory combinatorial optimization with reinforcement learning
title_sort exploratory combinatorial optimization with reinforcement learning
work_keys_str_mv AT barretttd exploratorycombinatorialoptimizationwithreinforcementlearning
AT clementswr exploratorycombinatorialoptimizationwithreinforcementlearning
AT foersterjn exploratorycombinatorialoptimizationwithreinforcementlearning
AT lvovskyai exploratorycombinatorialoptimizationwithreinforcementlearning