Distributional reinforcement learning for inventory management in multi-echelon supply chains

Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain...

Full description

Bibliographic Details
Main Authors:	Guoquan Wu, Miguel Ángel de Carvalho Servia, Max Mowbray
Format:	Article
Language:	English
Published:	Elsevier 2023-03-01
Series:	Digital Chemical Engineering
Subjects:	Distributional reinforcement learning Optimal control Inventory management Multi-echelon supply chains Machine learning
Online Access:	http://www.sciencedirect.com/science/article/pii/S2772508122000643

_version_	1811159883998822400
author	Guoquan Wu Miguel Ángel de Carvalho Servia Max Mowbray
author_facet	Guoquan Wu Miguel Ángel de Carvalho Servia Max Mowbray
author_sort	Guoquan Wu
collection	DOAJ
description	Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain problems. The proposed algorithm deploys a derivative free approach to balance exploration and exploitation of the neural policy’s parameter space, providing means to avoid low quality local optima. Furthermore, the method allows consideration of risk-sensitive formulations to learn a policy that optimizes, for example, the conditional value-at-risk. The capabilities of our algorithm are tested on a multi-echelon supply chain problem, and several combinatorial optimization problems. The results empirically demonstrate the method’s improved sample efficiency compared to the benchmark algorithm proximal policy optimization, and superior performance to shrinking horizon mixed integer formulations. Additionally, its risk-sensitive policy can offer protection from low probability, high severity scenarios. Finally, we provide a sensitivity analysis for technical intuition.
first_indexed	2024-04-10T05:48:15Z
format	Article
id	doaj.art-170402890cfe4f33a8b730fbfd14cd6d
institution	Directory Open Access Journal
issn	2772-5081
language	English
last_indexed	2024-04-10T05:48:15Z
publishDate	2023-03-01
publisher	Elsevier
record_format	Article
series	Digital Chemical Engineering
spelling	doaj.art-170402890cfe4f33a8b730fbfd14cd6d2023-03-05T04:26:09ZengElsevierDigital Chemical Engineering2772-50812023-03-016100073Distributional reinforcement learning for inventory management in multi-echelon supply chainsGuoquan Wu0Miguel Ángel de Carvalho Servia1Max Mowbray2Department of Chemical and Biomolecular Engineering, National University of Singapore, 117585, SingaporeDepartment of Chemical Engineering, Imperial College London, South Kensington, London, SW7 2AZ, United KingdomCentre for Process Integration, Department of Chemical Engineering, The University of Manchester, Manchester, M13 9PL, United Kingdom; Corresponding author.Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain problems. The proposed algorithm deploys a derivative free approach to balance exploration and exploitation of the neural policy’s parameter space, providing means to avoid low quality local optima. Furthermore, the method allows consideration of risk-sensitive formulations to learn a policy that optimizes, for example, the conditional value-at-risk. The capabilities of our algorithm are tested on a multi-echelon supply chain problem, and several combinatorial optimization problems. The results empirically demonstrate the method’s improved sample efficiency compared to the benchmark algorithm proximal policy optimization, and superior performance to shrinking horizon mixed integer formulations. Additionally, its risk-sensitive policy can offer protection from low probability, high severity scenarios. Finally, we provide a sensitivity analysis for technical intuition.http://www.sciencedirect.com/science/article/pii/S2772508122000643Distributional reinforcement learningOptimal controlInventory managementMulti-echelon supply chainsMachine learning
spellingShingle	Guoquan Wu Miguel Ángel de Carvalho Servia Max Mowbray Distributional reinforcement learning for inventory management in multi-echelon supply chains Digital Chemical Engineering Distributional reinforcement learning Optimal control Inventory management Multi-echelon supply chains Machine learning
title	Distributional reinforcement learning for inventory management in multi-echelon supply chains
title_full	Distributional reinforcement learning for inventory management in multi-echelon supply chains
title_fullStr	Distributional reinforcement learning for inventory management in multi-echelon supply chains
title_full_unstemmed	Distributional reinforcement learning for inventory management in multi-echelon supply chains
title_short	Distributional reinforcement learning for inventory management in multi-echelon supply chains
title_sort	distributional reinforcement learning for inventory management in multi echelon supply chains
topic	Distributional reinforcement learning Optimal control Inventory management Multi-echelon supply chains Machine learning
url	http://www.sciencedirect.com/science/article/pii/S2772508122000643
work_keys_str_mv	AT guoquanwu distributionalreinforcementlearningforinventorymanagementinmultiechelonsupplychains AT miguelangeldecarvalhoservia distributionalreinforcementlearningforinventorymanagementinmultiechelonsupplychains AT maxmowbray distributionalreinforcementlearningforinventorymanagementinmultiechelonsupplychains

Distributional reinforcement learning for inventory management in multi-echelon supply chains

Similar Items