Learning generalizable heuristics for solving vehicle routing problem under distribution shift

Combinatorial optimization problems (COPs) with NP-hardness are always featured by discrete search space and intractable computation to seek the optimal solution. As a fundamental COP in logistics, the vehicle routing problem (VRP) concerns the cost-optimal delivery of items from the depot to a set...

Descripción completa

Detalles Bibliográficos
Autor principal: Jiang, Yuan
Otros Autores: Zhang Jie
Formato: Thesis-Doctor of Philosophy
Lenguaje:English
Publicado: Nanyang Technological University 2024
Materias:
Acceso en línea:https://hdl.handle.net/10356/173657
_version_ 1826114170250067968
author Jiang, Yuan
author2 Zhang Jie
author_facet Zhang Jie
Jiang, Yuan
author_sort Jiang, Yuan
collection NTU
description Combinatorial optimization problems (COPs) with NP-hardness are always featured by discrete search space and intractable computation to seek the optimal solution. As a fundamental COP in logistics, the vehicle routing problem (VRP) concerns the cost-optimal delivery of items from the depot to a set of geographically scattered customers through vehicle(s). It has been extensively investigated for decades and found widespread applications in reality, such as waste collection, dial-a-ride and courier delivery. Studies on VRP with deep (reinforcement) learning have been emerging in recent years. Different from the conventional methods, this line of work aims at automatically searching heuristic policies by using neural networks to learn the underlying patterns in instances, which could be used to discover better policies than hand-crafted ones. Towards reducing the gaps to the highly optimized conventional heuristic solvers, a large number of efforts have been made to invent various deep models to solve the VRP variants, i.e., traveling salesman problem (TSP) and capacitated vehicle routing problem (CVRP). Although much success has been achieved, existing deep models always assume a pure spatial distribution of nodes (customers) for training, i.e., uniform distribution, which severely limits their applications given the impaired cross-distribution generalization ability. While it is natural to stipulate that a majority of typical instances follow an identical distribution, a minority of atypical (yet important) instances which follow different one(s) may always exist in reality. In this sense, the mono-training paradigm in existing deep models may cause inferior performance or even failures for those atypical instances. E.g., a trained routing policy may offer unreasonable routes to serve a group of (important) customers whose location pattern differs from the ones used in training. Intuitively, an alternative is to force the model to homogeneously treat all the instances of different distributions for training. However, it does not necessarily enhance the cross-distribution generalization performance in our view, as it may suffer high losses in the group of the atypical instances. To address this issue, in the first work, we propose an approach by exploiting group distributionally robust optimization (group DRO) to jointly train the deep models on instances of different groups (more than one group), where each group follows a distribution. In particular, we aim at minimizing the loss for the worst-case group during training, where we optimize the weights for different groups of instances and the parameters for the deep model in an interleaved manner. More importantly, we do not need to label the distribution for each group during inference. In addition, we also leverage convolutional neural network (CNN) to learn initial representations of VRP instances so that the distribution-aware features in spatial patterns could be favorably captured to boost the performance further. Secondly, motivated by the facts that, 1) a VRP instance can always be represented as a graph, 2) the VRP solution depends on the pattern of the graph (e.g. the distribution of nodes), we postulate that transferable structural patterns across diverse graphs could be helpful to improve the generalization against different distributions. Especially, similar local patterns may distribute across graphs even if those graphs belong to different distributions. On the other hand, recent advances in computer vision (CV) and natural language processing (NLP) have testified that pre-training an encoder network in a contrastive learning manner can produce more informative and transferable representations for downstream tasks. With the above principle, in my second work, we propose a multi-view graph contrastive learning (MVGCL) approach to foster the generalization capability of neural heuristics for VRPs, by mining the underlying patterns across graphs. In the third work, we propose an ensemble-based deep reinforcement learning method for VRPs, which learns a group of diverse sub-policies to cope with various instance distributions. In particular, to prevent convergence of the parameters to the same one, we enforce diversity across sub-policies by leveraging Bootstrap with random initialization. Moreover, we also explicitly pursue inequality between sub-policies by exploiting regularization terms during training to further enhance diversity. Experimental results show that our methods are able to outperform the state-of-the-art neural baselines on randomly generated instances of various distributions, and also generalizes favourably on the benchmark instances from TSPLib and CVRPLib, which confirmed the effectiveness of the whole method and the respective designs.
first_indexed 2024-10-01T03:35:07Z
format Thesis-Doctor of Philosophy
id ntu-10356/173657
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:35:07Z
publishDate 2024
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1736572024-03-07T08:52:06Z Learning generalizable heuristics for solving vehicle routing problem under distribution shift Jiang, Yuan Zhang Jie School of Computer Science and Engineering ZhangJ@ntu.edu.sg Computer and Information Science Vehicle routing problem Combinatorial optimization problem Deep learning Distribution shift Combinatorial optimization problems (COPs) with NP-hardness are always featured by discrete search space and intractable computation to seek the optimal solution. As a fundamental COP in logistics, the vehicle routing problem (VRP) concerns the cost-optimal delivery of items from the depot to a set of geographically scattered customers through vehicle(s). It has been extensively investigated for decades and found widespread applications in reality, such as waste collection, dial-a-ride and courier delivery. Studies on VRP with deep (reinforcement) learning have been emerging in recent years. Different from the conventional methods, this line of work aims at automatically searching heuristic policies by using neural networks to learn the underlying patterns in instances, which could be used to discover better policies than hand-crafted ones. Towards reducing the gaps to the highly optimized conventional heuristic solvers, a large number of efforts have been made to invent various deep models to solve the VRP variants, i.e., traveling salesman problem (TSP) and capacitated vehicle routing problem (CVRP). Although much success has been achieved, existing deep models always assume a pure spatial distribution of nodes (customers) for training, i.e., uniform distribution, which severely limits their applications given the impaired cross-distribution generalization ability. While it is natural to stipulate that a majority of typical instances follow an identical distribution, a minority of atypical (yet important) instances which follow different one(s) may always exist in reality. In this sense, the mono-training paradigm in existing deep models may cause inferior performance or even failures for those atypical instances. E.g., a trained routing policy may offer unreasonable routes to serve a group of (important) customers whose location pattern differs from the ones used in training. Intuitively, an alternative is to force the model to homogeneously treat all the instances of different distributions for training. However, it does not necessarily enhance the cross-distribution generalization performance in our view, as it may suffer high losses in the group of the atypical instances. To address this issue, in the first work, we propose an approach by exploiting group distributionally robust optimization (group DRO) to jointly train the deep models on instances of different groups (more than one group), where each group follows a distribution. In particular, we aim at minimizing the loss for the worst-case group during training, where we optimize the weights for different groups of instances and the parameters for the deep model in an interleaved manner. More importantly, we do not need to label the distribution for each group during inference. In addition, we also leverage convolutional neural network (CNN) to learn initial representations of VRP instances so that the distribution-aware features in spatial patterns could be favorably captured to boost the performance further. Secondly, motivated by the facts that, 1) a VRP instance can always be represented as a graph, 2) the VRP solution depends on the pattern of the graph (e.g. the distribution of nodes), we postulate that transferable structural patterns across diverse graphs could be helpful to improve the generalization against different distributions. Especially, similar local patterns may distribute across graphs even if those graphs belong to different distributions. On the other hand, recent advances in computer vision (CV) and natural language processing (NLP) have testified that pre-training an encoder network in a contrastive learning manner can produce more informative and transferable representations for downstream tasks. With the above principle, in my second work, we propose a multi-view graph contrastive learning (MVGCL) approach to foster the generalization capability of neural heuristics for VRPs, by mining the underlying patterns across graphs. In the third work, we propose an ensemble-based deep reinforcement learning method for VRPs, which learns a group of diverse sub-policies to cope with various instance distributions. In particular, to prevent convergence of the parameters to the same one, we enforce diversity across sub-policies by leveraging Bootstrap with random initialization. Moreover, we also explicitly pursue inequality between sub-policies by exploiting regularization terms during training to further enhance diversity. Experimental results show that our methods are able to outperform the state-of-the-art neural baselines on randomly generated instances of various distributions, and also generalizes favourably on the benchmark instances from TSPLib and CVRPLib, which confirmed the effectiveness of the whole method and the respective designs. Doctor of Philosophy 2024-02-21T05:20:39Z 2024-02-21T05:20:39Z 2024 Thesis-Doctor of Philosophy Jiang, Y. (2024). Learning generalizable heuristics for solving vehicle routing problem under distribution shift. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173657 https://hdl.handle.net/10356/173657 10.32657/10356/173657 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle Computer and Information Science
Vehicle routing problem
Combinatorial optimization problem
Deep learning
Distribution shift
Jiang, Yuan
Learning generalizable heuristics for solving vehicle routing problem under distribution shift
title Learning generalizable heuristics for solving vehicle routing problem under distribution shift
title_full Learning generalizable heuristics for solving vehicle routing problem under distribution shift
title_fullStr Learning generalizable heuristics for solving vehicle routing problem under distribution shift
title_full_unstemmed Learning generalizable heuristics for solving vehicle routing problem under distribution shift
title_short Learning generalizable heuristics for solving vehicle routing problem under distribution shift
title_sort learning generalizable heuristics for solving vehicle routing problem under distribution shift
topic Computer and Information Science
Vehicle routing problem
Combinatorial optimization problem
Deep learning
Distribution shift
url https://hdl.handle.net/10356/173657
work_keys_str_mv AT jiangyuan learninggeneralizableheuristicsforsolvingvehicleroutingproblemunderdistributionshift