Distributional reinforcement learning for inventory management in multi-echelon supply chains

Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain...

Full description

Bibliographic Details
Main Authors: Guoquan Wu, Miguel Ángel de Carvalho Servia, Max Mowbray
Format: Article
Language:English
Published: Elsevier 2023-03-01
Series:Digital Chemical Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772508122000643
_version_ 1811159883998822400
author Guoquan Wu
Miguel Ángel de Carvalho Servia
Max Mowbray
author_facet Guoquan Wu
Miguel Ángel de Carvalho Servia
Max Mowbray
author_sort Guoquan Wu
collection DOAJ
description Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain problems. The proposed algorithm deploys a derivative free approach to balance exploration and exploitation of the neural policy’s parameter space, providing means to avoid low quality local optima. Furthermore, the method allows consideration of risk-sensitive formulations to learn a policy that optimizes, for example, the conditional value-at-risk. The capabilities of our algorithm are tested on a multi-echelon supply chain problem, and several combinatorial optimization problems. The results empirically demonstrate the method’s improved sample efficiency compared to the benchmark algorithm proximal policy optimization, and superior performance to shrinking horizon mixed integer formulations. Additionally, its risk-sensitive policy can offer protection from low probability, high severity scenarios. Finally, we provide a sensitivity analysis for technical intuition.
first_indexed 2024-04-10T05:48:15Z
format Article
id doaj.art-170402890cfe4f33a8b730fbfd14cd6d
institution Directory Open Access Journal
issn 2772-5081
language English
last_indexed 2024-04-10T05:48:15Z
publishDate 2023-03-01
publisher Elsevier
record_format Article
series Digital Chemical Engineering
spelling doaj.art-170402890cfe4f33a8b730fbfd14cd6d2023-03-05T04:26:09ZengElsevierDigital Chemical Engineering2772-50812023-03-016100073Distributional reinforcement learning for inventory management in multi-echelon supply chainsGuoquan Wu0Miguel Ángel de Carvalho Servia1Max Mowbray2Department of Chemical and Biomolecular Engineering, National University of Singapore, 117585, SingaporeDepartment of Chemical Engineering, Imperial College London, South Kensington, London, SW7 2AZ, United KingdomCentre for Process Integration, Department of Chemical Engineering, The University of Manchester, Manchester, M13 9PL, United Kingdom; Corresponding author.Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain problems. The proposed algorithm deploys a derivative free approach to balance exploration and exploitation of the neural policy’s parameter space, providing means to avoid low quality local optima. Furthermore, the method allows consideration of risk-sensitive formulations to learn a policy that optimizes, for example, the conditional value-at-risk. The capabilities of our algorithm are tested on a multi-echelon supply chain problem, and several combinatorial optimization problems. The results empirically demonstrate the method’s improved sample efficiency compared to the benchmark algorithm proximal policy optimization, and superior performance to shrinking horizon mixed integer formulations. Additionally, its risk-sensitive policy can offer protection from low probability, high severity scenarios. Finally, we provide a sensitivity analysis for technical intuition.http://www.sciencedirect.com/science/article/pii/S2772508122000643Distributional reinforcement learningOptimal controlInventory managementMulti-echelon supply chainsMachine learning
spellingShingle Guoquan Wu
Miguel Ángel de Carvalho Servia
Max Mowbray
Distributional reinforcement learning for inventory management in multi-echelon supply chains
Digital Chemical Engineering
Distributional reinforcement learning
Optimal control
Inventory management
Multi-echelon supply chains
Machine learning
title Distributional reinforcement learning for inventory management in multi-echelon supply chains
title_full Distributional reinforcement learning for inventory management in multi-echelon supply chains
title_fullStr Distributional reinforcement learning for inventory management in multi-echelon supply chains
title_full_unstemmed Distributional reinforcement learning for inventory management in multi-echelon supply chains
title_short Distributional reinforcement learning for inventory management in multi-echelon supply chains
title_sort distributional reinforcement learning for inventory management in multi echelon supply chains
topic Distributional reinforcement learning
Optimal control
Inventory management
Multi-echelon supply chains
Machine learning
url http://www.sciencedirect.com/science/article/pii/S2772508122000643
work_keys_str_mv AT guoquanwu distributionalreinforcementlearningforinventorymanagementinmultiechelonsupplychains
AT miguelangeldecarvalhoservia distributionalreinforcementlearningforinventorymanagementinmultiechelonsupplychains
AT maxmowbray distributionalreinforcementlearningforinventorymanagementinmultiechelonsupplychains