Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
Deep reinforcement learning (DRL) has shown promise in solving challenging combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) and vehicle routing problem (VRP). However, existing DRL methods rely on manually designed reward functions, which may be inaccurate or un...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-10-01
|
Series: | Journal of King Saud University: Computer and Information Sciences |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1319157823003415 |
_version_ | 1797629709160284160 |
---|---|
author | Qi Wang Yongsheng Hao Jiawei Zhang |
author_facet | Qi Wang Yongsheng Hao Jiawei Zhang |
author_sort | Qi Wang |
collection | DOAJ |
description | Deep reinforcement learning (DRL) has shown promise in solving challenging combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) and vehicle routing problem (VRP). However, existing DRL methods rely on manually designed reward functions, which may be inaccurate or unrealistic. Moreover, traditional DRL algorithms suffer from unstable training and sparse reward problems. This paper proposes GIRL (Generative Inverse Reinforcement Learning), a method to learn 2-opt heuristics without explicit extrinsic rewards to address these limitations. GIRL combines generative adversarial networks (GANs) and DRL to learn effective policies and reward functions in a reverse end-to-end fashion, improving generalization capabilities. Furthermore, we introduce a self-attentional policy network tailored for 2-opt heuristics and train the framework using a soft actor-critic algorithm along with a discriminator in the GAN. Extensive experiments on various TSP and VRP instances demonstrate superior performance compared to state-of-the-art methods. Moreover, integrating GANs and DRL enables data-driven reward functions, improving accuracy and realism. Using self-attentional networks and the soft actor-critic algorithm enhances training stability and addresses the sparse reward problem. This work advances reinforcement learning techniques in CO, enabling more accurate and practical optimization methods in real-world applications. |
first_indexed | 2024-03-11T10:57:54Z |
format | Article |
id | doaj.art-129ce68c874f4900b13a9985a6f3556a |
institution | Directory Open Access Journal |
issn | 1319-1578 |
language | English |
last_indexed | 2024-03-11T10:57:54Z |
publishDate | 2023-10-01 |
publisher | Elsevier |
record_format | Article |
series | Journal of King Saud University: Computer and Information Sciences |
spelling | doaj.art-129ce68c874f4900b13a9985a6f3556a2023-11-13T04:09:04ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782023-10-01359101787Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problemsQi Wang0Yongsheng Hao1Jiawei Zhang2Information Science and Technology College, Dalian Maritime University, Dalian 116026, China; Corresponding author.School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China; Network Center, Nanjing University of Information Science & Technology, Nanjing 210044, ChinaDepartment of New Network, Peng Cheng Laboratory, ChinaDeep reinforcement learning (DRL) has shown promise in solving challenging combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) and vehicle routing problem (VRP). However, existing DRL methods rely on manually designed reward functions, which may be inaccurate or unrealistic. Moreover, traditional DRL algorithms suffer from unstable training and sparse reward problems. This paper proposes GIRL (Generative Inverse Reinforcement Learning), a method to learn 2-opt heuristics without explicit extrinsic rewards to address these limitations. GIRL combines generative adversarial networks (GANs) and DRL to learn effective policies and reward functions in a reverse end-to-end fashion, improving generalization capabilities. Furthermore, we introduce a self-attentional policy network tailored for 2-opt heuristics and train the framework using a soft actor-critic algorithm along with a discriminator in the GAN. Extensive experiments on various TSP and VRP instances demonstrate superior performance compared to state-of-the-art methods. Moreover, integrating GANs and DRL enables data-driven reward functions, improving accuracy and realism. Using self-attentional networks and the soft actor-critic algorithm enhances training stability and addresses the sparse reward problem. This work advances reinforcement learning techniques in CO, enabling more accurate and practical optimization methods in real-world applications.http://www.sciencedirect.com/science/article/pii/S1319157823003415Routing problemsDeep reinforcement learningGenerative adversarial networks2-opt heuristicsInverse reinforcement learning |
spellingShingle | Qi Wang Yongsheng Hao Jiawei Zhang Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems Journal of King Saud University: Computer and Information Sciences Routing problems Deep reinforcement learning Generative adversarial networks 2-opt heuristics Inverse reinforcement learning |
title | Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems |
title_full | Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems |
title_fullStr | Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems |
title_full_unstemmed | Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems |
title_short | Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems |
title_sort | generative inverse reinforcement learning for learning 2 opt heuristics without extrinsic rewards in routing problems |
topic | Routing problems Deep reinforcement learning Generative adversarial networks 2-opt heuristics Inverse reinforcement learning |
url | http://www.sciencedirect.com/science/article/pii/S1319157823003415 |
work_keys_str_mv | AT qiwang generativeinversereinforcementlearningforlearning2optheuristicswithoutextrinsicrewardsinroutingproblems AT yongshenghao generativeinversereinforcementlearningforlearning2optheuristicswithoutextrinsicrewardsinroutingproblems AT jiaweizhang generativeinversereinforcementlearningforlearning2optheuristicswithoutextrinsicrewardsinroutingproblems |