Reinforcement Learning Approach to Stochastic Vehicle Routing Problem With Correlated Demands

We present a novel end-to-end framework for solving the Vehicle Routing Problem with stochastic demands (VRPSD) using Reinforcement Learning (RL). Our formulation incorporates the correlation between stochastic demands through other observable stochastic variables, thereby offering an experimental d...

Full description

Bibliographic Details
Main Authors: Zangir Iklassov, Ikboljon Sobirov, Ruben Solozabal, Martin Takac
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10223206/
_version_ 1797736961196163072
author Zangir Iklassov
Ikboljon Sobirov
Ruben Solozabal
Martin Takac
author_facet Zangir Iklassov
Ikboljon Sobirov
Ruben Solozabal
Martin Takac
author_sort Zangir Iklassov
collection DOAJ
description We present a novel end-to-end framework for solving the Vehicle Routing Problem with stochastic demands (VRPSD) using Reinforcement Learning (RL). Our formulation incorporates the correlation between stochastic demands through other observable stochastic variables, thereby offering an experimental demonstration of the theoretical premise that non-i.i.d. stochastic demands provide opportunities for improved routing solutions. Our approach bridges the gap in the application of RL to VRPSD and consists of a parameterized stochastic policy optimized using a policy gradient algorithm to generate a sequence of actions that form the solution. Our model outperforms previous state-of-the-art metaheuristics and demonstrates robustness to changes in the environment, such as the supply type, vehicle capacity, correlation, and noise levels of demand. Moreover, the model can be easily retrained for different VRPSD scenarios by observing the reward signals and following feasibility constraints, making it highly flexible and scalable. These findings highlight the potential of RL to enhance the transportation efficiency and mitigate its environmental impact in stochastic routing problems. Our implementation is available in <uri>https://github.com/Zangir/SVRP</uri>.
first_indexed 2024-03-12T13:21:29Z
format Article
id doaj.art-8106d37416784508906865d6f0a2a522
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-12T13:21:29Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-8106d37416784508906865d6f0a2a5222023-08-25T23:00:57ZengIEEEIEEE Access2169-35362023-01-0111879588796910.1109/ACCESS.2023.330607610223206Reinforcement Learning Approach to Stochastic Vehicle Routing Problem With Correlated DemandsZangir Iklassov0https://orcid.org/0000-0002-2835-990XIkboljon Sobirov1https://orcid.org/0000-0002-0476-6359Ruben Solozabal2Martin Takac3https://orcid.org/0000-0001-7455-2025Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab EmiratesDepartment of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab EmiratesDepartment of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab EmiratesDepartment of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab EmiratesWe present a novel end-to-end framework for solving the Vehicle Routing Problem with stochastic demands (VRPSD) using Reinforcement Learning (RL). Our formulation incorporates the correlation between stochastic demands through other observable stochastic variables, thereby offering an experimental demonstration of the theoretical premise that non-i.i.d. stochastic demands provide opportunities for improved routing solutions. Our approach bridges the gap in the application of RL to VRPSD and consists of a parameterized stochastic policy optimized using a policy gradient algorithm to generate a sequence of actions that form the solution. Our model outperforms previous state-of-the-art metaheuristics and demonstrates robustness to changes in the environment, such as the supply type, vehicle capacity, correlation, and noise levels of demand. Moreover, the model can be easily retrained for different VRPSD scenarios by observing the reward signals and following feasibility constraints, making it highly flexible and scalable. These findings highlight the potential of RL to enhance the transportation efficiency and mitigate its environmental impact in stochastic routing problems. Our implementation is available in <uri>https://github.com/Zangir/SVRP</uri>.https://ieeexplore.ieee.org/document/10223206/Reinforcement learningstopchastic optimizationvehicle routing problem
spellingShingle Zangir Iklassov
Ikboljon Sobirov
Ruben Solozabal
Martin Takac
Reinforcement Learning Approach to Stochastic Vehicle Routing Problem With Correlated Demands
IEEE Access
Reinforcement learning
stopchastic optimization
vehicle routing problem
title Reinforcement Learning Approach to Stochastic Vehicle Routing Problem With Correlated Demands
title_full Reinforcement Learning Approach to Stochastic Vehicle Routing Problem With Correlated Demands
title_fullStr Reinforcement Learning Approach to Stochastic Vehicle Routing Problem With Correlated Demands
title_full_unstemmed Reinforcement Learning Approach to Stochastic Vehicle Routing Problem With Correlated Demands
title_short Reinforcement Learning Approach to Stochastic Vehicle Routing Problem With Correlated Demands
title_sort reinforcement learning approach to stochastic vehicle routing problem with correlated demands
topic Reinforcement learning
stopchastic optimization
vehicle routing problem
url https://ieeexplore.ieee.org/document/10223206/
work_keys_str_mv AT zangiriklassov reinforcementlearningapproachtostochasticvehicleroutingproblemwithcorrelateddemands
AT ikboljonsobirov reinforcementlearningapproachtostochasticvehicleroutingproblemwithcorrelateddemands
AT rubensolozabal reinforcementlearningapproachtostochasticvehicleroutingproblemwithcorrelateddemands
AT martintakac reinforcementlearningapproachtostochasticvehicleroutingproblemwithcorrelateddemands