Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems

Deep reinforcement learning (DRL) has shown promise in solving challenging combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) and vehicle routing problem (VRP). However, existing DRL methods rely on manually designed reward functions, which may be inaccurate or un...

Full description

Bibliographic Details
Main Authors: Qi Wang, Yongsheng Hao, Jiawei Zhang
Format: Article
Language:English
Published: Elsevier 2023-10-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157823003415
_version_ 1797629709160284160
author Qi Wang
Yongsheng Hao
Jiawei Zhang
author_facet Qi Wang
Yongsheng Hao
Jiawei Zhang
author_sort Qi Wang
collection DOAJ
description Deep reinforcement learning (DRL) has shown promise in solving challenging combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) and vehicle routing problem (VRP). However, existing DRL methods rely on manually designed reward functions, which may be inaccurate or unrealistic. Moreover, traditional DRL algorithms suffer from unstable training and sparse reward problems. This paper proposes GIRL (Generative Inverse Reinforcement Learning), a method to learn 2-opt heuristics without explicit extrinsic rewards to address these limitations. GIRL combines generative adversarial networks (GANs) and DRL to learn effective policies and reward functions in a reverse end-to-end fashion, improving generalization capabilities. Furthermore, we introduce a self-attentional policy network tailored for 2-opt heuristics and train the framework using a soft actor-critic algorithm along with a discriminator in the GAN. Extensive experiments on various TSP and VRP instances demonstrate superior performance compared to state-of-the-art methods. Moreover, integrating GANs and DRL enables data-driven reward functions, improving accuracy and realism. Using self-attentional networks and the soft actor-critic algorithm enhances training stability and addresses the sparse reward problem. This work advances reinforcement learning techniques in CO, enabling more accurate and practical optimization methods in real-world applications.
first_indexed 2024-03-11T10:57:54Z
format Article
id doaj.art-129ce68c874f4900b13a9985a6f3556a
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-03-11T10:57:54Z
publishDate 2023-10-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-129ce68c874f4900b13a9985a6f3556a2023-11-13T04:09:04ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782023-10-01359101787Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problemsQi Wang0Yongsheng Hao1Jiawei Zhang2Information Science and Technology College, Dalian Maritime University, Dalian 116026, China; Corresponding author.School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China; Network Center, Nanjing University of Information Science & Technology, Nanjing 210044, ChinaDepartment of New Network, Peng Cheng Laboratory, ChinaDeep reinforcement learning (DRL) has shown promise in solving challenging combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) and vehicle routing problem (VRP). However, existing DRL methods rely on manually designed reward functions, which may be inaccurate or unrealistic. Moreover, traditional DRL algorithms suffer from unstable training and sparse reward problems. This paper proposes GIRL (Generative Inverse Reinforcement Learning), a method to learn 2-opt heuristics without explicit extrinsic rewards to address these limitations. GIRL combines generative adversarial networks (GANs) and DRL to learn effective policies and reward functions in a reverse end-to-end fashion, improving generalization capabilities. Furthermore, we introduce a self-attentional policy network tailored for 2-opt heuristics and train the framework using a soft actor-critic algorithm along with a discriminator in the GAN. Extensive experiments on various TSP and VRP instances demonstrate superior performance compared to state-of-the-art methods. Moreover, integrating GANs and DRL enables data-driven reward functions, improving accuracy and realism. Using self-attentional networks and the soft actor-critic algorithm enhances training stability and addresses the sparse reward problem. This work advances reinforcement learning techniques in CO, enabling more accurate and practical optimization methods in real-world applications.http://www.sciencedirect.com/science/article/pii/S1319157823003415Routing problemsDeep reinforcement learningGenerative adversarial networks2-opt heuristicsInverse reinforcement learning
spellingShingle Qi Wang
Yongsheng Hao
Jiawei Zhang
Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
Journal of King Saud University: Computer and Information Sciences
Routing problems
Deep reinforcement learning
Generative adversarial networks
2-opt heuristics
Inverse reinforcement learning
title Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
title_full Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
title_fullStr Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
title_full_unstemmed Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
title_short Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
title_sort generative inverse reinforcement learning for learning 2 opt heuristics without extrinsic rewards in routing problems
topic Routing problems
Deep reinforcement learning
Generative adversarial networks
2-opt heuristics
Inverse reinforcement learning
url http://www.sciencedirect.com/science/article/pii/S1319157823003415
work_keys_str_mv AT qiwang generativeinversereinforcementlearningforlearning2optheuristicswithoutextrinsicrewardsinroutingproblems
AT yongshenghao generativeinversereinforcementlearningforlearning2optheuristicswithoutextrinsicrewardsinroutingproblems
AT jiaweizhang generativeinversereinforcementlearningforlearning2optheuristicswithoutextrinsicrewardsinroutingproblems