Distributional reinforcement learning for inventory management in multi-echelon supply chains

Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain...

Full description

Bibliographic Details
Main Authors: Guoquan Wu, Miguel Ángel de Carvalho Servia, Max Mowbray
Format: Article
Language:English
Published: Elsevier 2023-03-01
Series:Digital Chemical Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772508122000643
Description
Summary:Reinforcement Learning (RL) is an effective method to solve stochastic sequential decision-making problems. This is a problem description common to supply chain operations, however, most RL algorithms are tailored for game-based benchmarks. Here, we propose a deep RL method tailored for supply chain problems. The proposed algorithm deploys a derivative free approach to balance exploration and exploitation of the neural policy’s parameter space, providing means to avoid low quality local optima. Furthermore, the method allows consideration of risk-sensitive formulations to learn a policy that optimizes, for example, the conditional value-at-risk. The capabilities of our algorithm are tested on a multi-echelon supply chain problem, and several combinatorial optimization problems. The results empirically demonstrate the method’s improved sample efficiency compared to the benchmark algorithm proximal policy optimization, and superior performance to shrinking horizon mixed integer formulations. Additionally, its risk-sensitive policy can offer protection from low probability, high severity scenarios. Finally, we provide a sensitivity analysis for technical intuition.
ISSN:2772-5081