Distributional reinforcement learning for inventory management in multi-echelon supply chains
Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-03-01
|
Series: | Digital Chemical Engineering |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2772508122000643 |
_version_ | 1811159883998822400 |
---|---|
author | Guoquan Wu Miguel Ángel de Carvalho Servia Max Mowbray |
author_facet | Guoquan Wu Miguel Ángel de Carvalho Servia Max Mowbray |
author_sort | Guoquan Wu |
collection | DOAJ |
description | Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain problems. The proposed algorithm deploys a derivative free approach to balance exploration and exploitation of the neural policy’s parameter space, providing means to avoid low quality local optima. Furthermore, the method allows consideration of risk-sensitive formulations to learn a policy that optimizes, for example, the conditional value-at-risk. The capabilities of our algorithm are tested on a multi-echelon supply chain problem, and several combinatorial optimization problems. The results empirically demonstrate the method’s improved sample efficiency compared to the benchmark algorithm proximal policy optimization, and superior performance to shrinking horizon mixed integer formulations. Additionally, its risk-sensitive policy can offer protection from low probability, high severity scenarios. Finally, we provide a sensitivity analysis for technical intuition. |
first_indexed | 2024-04-10T05:48:15Z |
format | Article |
id | doaj.art-170402890cfe4f33a8b730fbfd14cd6d |
institution | Directory Open Access Journal |
issn | 2772-5081 |
language | English |
last_indexed | 2024-04-10T05:48:15Z |
publishDate | 2023-03-01 |
publisher | Elsevier |
record_format | Article |
series | Digital Chemical Engineering |
spelling | doaj.art-170402890cfe4f33a8b730fbfd14cd6d2023-03-05T04:26:09ZengElsevierDigital Chemical Engineering2772-50812023-03-016100073Distributional reinforcement learning for inventory management in multi-echelon supply chainsGuoquan Wu0Miguel Ángel de Carvalho Servia1Max Mowbray2Department of Chemical and Biomolecular Engineering, National University of Singapore, 117585, SingaporeDepartment of Chemical Engineering, Imperial College London, South Kensington, London, SW7 2AZ, United KingdomCentre for Process Integration, Department of Chemical Engineering, The University of Manchester, Manchester, M13 9PL, United Kingdom; Corresponding author.Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain problems. The proposed algorithm deploys a derivative free approach to balance exploration and exploitation of the neural policy’s parameter space, providing means to avoid low quality local optima. Furthermore, the method allows consideration of risk-sensitive formulations to learn a policy that optimizes, for example, the conditional value-at-risk. The capabilities of our algorithm are tested on a multi-echelon supply chain problem, and several combinatorial optimization problems. The results empirically demonstrate the method’s improved sample efficiency compared to the benchmark algorithm proximal policy optimization, and superior performance to shrinking horizon mixed integer formulations. Additionally, its risk-sensitive policy can offer protection from low probability, high severity scenarios. Finally, we provide a sensitivity analysis for technical intuition.http://www.sciencedirect.com/science/article/pii/S2772508122000643Distributional reinforcement learningOptimal controlInventory managementMulti-echelon supply chainsMachine learning |
spellingShingle | Guoquan Wu Miguel Ángel de Carvalho Servia Max Mowbray Distributional reinforcement learning for inventory management in multi-echelon supply chains Digital Chemical Engineering Distributional reinforcement learning Optimal control Inventory management Multi-echelon supply chains Machine learning |
title | Distributional reinforcement learning for inventory management in multi-echelon supply chains |
title_full | Distributional reinforcement learning for inventory management in multi-echelon supply chains |
title_fullStr | Distributional reinforcement learning for inventory management in multi-echelon supply chains |
title_full_unstemmed | Distributional reinforcement learning for inventory management in multi-echelon supply chains |
title_short | Distributional reinforcement learning for inventory management in multi-echelon supply chains |
title_sort | distributional reinforcement learning for inventory management in multi echelon supply chains |
topic | Distributional reinforcement learning Optimal control Inventory management Multi-echelon supply chains Machine learning |
url | http://www.sciencedirect.com/science/article/pii/S2772508122000643 |
work_keys_str_mv | AT guoquanwu distributionalreinforcementlearningforinventorymanagementinmultiechelonsupplychains AT miguelangeldecarvalhoservia distributionalreinforcementlearningforinventorymanagementinmultiechelonsupplychains AT maxmowbray distributionalreinforcementlearningforinventorymanagementinmultiechelonsupplychains |