Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems

Deep reinforcement learning (DRL) has shown promise in solving challenging combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) and vehicle routing problem (VRP). However, existing DRL methods rely on manually designed reward functions, which may be inaccurate or un...

Full description

Bibliographic Details
Main Authors:	Qi Wang, Yongsheng Hao, Jiawei Zhang
Format:	Article
Language:	English
Published:	Elsevier 2023-10-01
Series:	Journal of King Saud University: Computer and Information Sciences
Subjects:	Routing problems Deep reinforcement learning Generative adversarial networks 2-opt heuristics Inverse reinforcement learning
Online Access:	http://www.sciencedirect.com/science/article/pii/S1319157823003415

_version_	1797629709160284160
author	Qi Wang Yongsheng Hao Jiawei Zhang
author_facet	Qi Wang Yongsheng Hao Jiawei Zhang
author_sort	Qi Wang
collection	DOAJ
description	Deep reinforcement learning (DRL) has shown promise in solving challenging combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) and vehicle routing problem (VRP). However, existing DRL methods rely on manually designed reward functions, which may be inaccurate or unrealistic. Moreover, traditional DRL algorithms suffer from unstable training and sparse reward problems. This paper proposes GIRL (Generative Inverse Reinforcement Learning), a method to learn 2-opt heuristics without explicit extrinsic rewards to address these limitations. GIRL combines generative adversarial networks (GANs) and DRL to learn effective policies and reward functions in a reverse end-to-end fashion, improving generalization capabilities. Furthermore, we introduce a self-attentional policy network tailored for 2-opt heuristics and train the framework using a soft actor-critic algorithm along with a discriminator in the GAN. Extensive experiments on various TSP and VRP instances demonstrate superior performance compared to state-of-the-art methods. Moreover, integrating GANs and DRL enables data-driven reward functions, improving accuracy and realism. Using self-attentional networks and the soft actor-critic algorithm enhances training stability and addresses the sparse reward problem. This work advances reinforcement learning techniques in CO, enabling more accurate and practical optimization methods in real-world applications.
first_indexed	2024-03-11T10:57:54Z
format	Article
id	doaj.art-129ce68c874f4900b13a9985a6f3556a
institution	Directory Open Access Journal
issn	1319-1578
language	English
last_indexed	2024-03-11T10:57:54Z
publishDate	2023-10-01
publisher	Elsevier
record_format	Article
series	Journal of King Saud University: Computer and Information Sciences
spelling	doaj.art-129ce68c874f4900b13a9985a6f3556a2023-11-13T04:09:04ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782023-10-01359101787Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problemsQi Wang0Yongsheng Hao1Jiawei Zhang2Information Science and Technology College, Dalian Maritime University, Dalian 116026, China; Corresponding author.School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China; Network Center, Nanjing University of Information Science & Technology, Nanjing 210044, ChinaDepartment of New Network, Peng Cheng Laboratory, ChinaDeep reinforcement learning (DRL) has shown promise in solving challenging combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) and vehicle routing problem (VRP). However, existing DRL methods rely on manually designed reward functions, which may be inaccurate or unrealistic. Moreover, traditional DRL algorithms suffer from unstable training and sparse reward problems. This paper proposes GIRL (Generative Inverse Reinforcement Learning), a method to learn 2-opt heuristics without explicit extrinsic rewards to address these limitations. GIRL combines generative adversarial networks (GANs) and DRL to learn effective policies and reward functions in a reverse end-to-end fashion, improving generalization capabilities. Furthermore, we introduce a self-attentional policy network tailored for 2-opt heuristics and train the framework using a soft actor-critic algorithm along with a discriminator in the GAN. Extensive experiments on various TSP and VRP instances demonstrate superior performance compared to state-of-the-art methods. Moreover, integrating GANs and DRL enables data-driven reward functions, improving accuracy and realism. Using self-attentional networks and the soft actor-critic algorithm enhances training stability and addresses the sparse reward problem. This work advances reinforcement learning techniques in CO, enabling more accurate and practical optimization methods in real-world applications.http://www.sciencedirect.com/science/article/pii/S1319157823003415Routing problemsDeep reinforcement learningGenerative adversarial networks2-opt heuristicsInverse reinforcement learning
spellingShingle	Qi Wang Yongsheng Hao Jiawei Zhang Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems Journal of King Saud University: Computer and Information Sciences Routing problems Deep reinforcement learning Generative adversarial networks 2-opt heuristics Inverse reinforcement learning
title	Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
title_full	Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
title_fullStr	Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
title_full_unstemmed	Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
title_short	Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
title_sort	generative inverse reinforcement learning for learning 2 opt heuristics without extrinsic rewards in routing problems
topic	Routing problems Deep reinforcement learning Generative adversarial networks 2-opt heuristics Inverse reinforcement learning
url	http://www.sciencedirect.com/science/article/pii/S1319157823003415
work_keys_str_mv	AT qiwang generativeinversereinforcementlearningforlearning2optheuristicswithoutextrinsicrewardsinroutingproblems AT yongshenghao generativeinversereinforcementlearningforlearning2optheuristicswithoutextrinsicrewardsinroutingproblems AT jiaweizhang generativeinversereinforcementlearningforlearning2optheuristicswithoutextrinsicrewardsinroutingproblems

Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems

Similar Items