Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units

Abstract Background Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified beforehand to indicate the goal of tasks....

Full description

Bibliographic Details
Main Authors: Chao Yu, Jiming Liu, Hongyi Zhao
Format: Article
Language:English
Published: BMC 2019-04-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-019-0763-6
_version_ 1798044866429583360
author Chao Yu
Jiming Liu
Hongyi Zhao
author_facet Chao Yu
Jiming Liu
Hongyi Zhao
author_sort Chao Yu
collection DOAJ
description Abstract Background Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified beforehand to indicate the goal of tasks. However, there is usually no explicit information regarding the reward function in medical records. It is then necessary to consider an approach whereby the reward function can be learned from a set of presumably optimal treatment trajectories using retrospective real medical data. This paper applies inverse RL in inferring the reward functions that clinicians have in mind during their decisions on weaning of mechanical ventilation and sedative dosing in Intensive Care Units (ICUs). Methods We model the decision making problem as a Markov Decision Process, and use a batch RL method, Fitted Q Iterations with Gradient Boosting Decision Tree, to learn a suitable ventilator weaning policy from real trajectories in retrospective ICU data. A Bayesian inverse RL method is then applied to infer the latent reward functions in terms of weights in trading off various aspects of evaluation criterion. We then evaluate how the policy learned using the Bayesian inverse RL method matches the policy given by clinicians, as compared to other policies learned with fixed reward functions. Results Results show that the inverse RL method is capable of extracting meaningful indicators for recommending extubation readiness and sedative dosage, indicating that clinicians pay more attention to patients’ physiological stability (e.g., heart rate and respiration rate), rather than oxygenation criteria (FiO 2, PEEP and SpO 2) which is supported by previous RL methods. Moreover, by discovering the optimal weights, new effective treatment protocols can be suggested. Conclusions Inverse RL is an effective approach to discovering clinicians’ underlying reward functions for designing better treatment protocols in the ventilation weaning and sedative dosing in future ICUs.
first_indexed 2024-04-11T23:11:09Z
format Article
id doaj.art-a32a2f757eb3416f8c09264c784c7a81
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-04-11T23:11:09Z
publishDate 2019-04-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-a32a2f757eb3416f8c09264c784c7a812022-12-22T03:57:50ZengBMCBMC Medical Informatics and Decision Making1472-69472019-04-0119S211112010.1186/s12911-019-0763-6Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care unitsChao Yu0Jiming Liu1Hongyi Zhao2School of Computer Science and Technology, Dalian University of TechnologyDepartment of Computer Science, Hong Kong Baptist UniversitySchool of Computer Science and Technology, Dalian University of TechnologyAbstract Background Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified beforehand to indicate the goal of tasks. However, there is usually no explicit information regarding the reward function in medical records. It is then necessary to consider an approach whereby the reward function can be learned from a set of presumably optimal treatment trajectories using retrospective real medical data. This paper applies inverse RL in inferring the reward functions that clinicians have in mind during their decisions on weaning of mechanical ventilation and sedative dosing in Intensive Care Units (ICUs). Methods We model the decision making problem as a Markov Decision Process, and use a batch RL method, Fitted Q Iterations with Gradient Boosting Decision Tree, to learn a suitable ventilator weaning policy from real trajectories in retrospective ICU data. A Bayesian inverse RL method is then applied to infer the latent reward functions in terms of weights in trading off various aspects of evaluation criterion. We then evaluate how the policy learned using the Bayesian inverse RL method matches the policy given by clinicians, as compared to other policies learned with fixed reward functions. Results Results show that the inverse RL method is capable of extracting meaningful indicators for recommending extubation readiness and sedative dosage, indicating that clinicians pay more attention to patients’ physiological stability (e.g., heart rate and respiration rate), rather than oxygenation criteria (FiO 2, PEEP and SpO 2) which is supported by previous RL methods. Moreover, by discovering the optimal weights, new effective treatment protocols can be suggested. Conclusions Inverse RL is an effective approach to discovering clinicians’ underlying reward functions for designing better treatment protocols in the ventilation weaning and sedative dosing in future ICUs.http://link.springer.com/article/10.1186/s12911-019-0763-6Reinforcement learningInverse learningMechanical ventilationSedative dosingIntensive care units
spellingShingle Chao Yu
Jiming Liu
Hongyi Zhao
Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
BMC Medical Informatics and Decision Making
Reinforcement learning
Inverse learning
Mechanical ventilation
Sedative dosing
Intensive care units
title Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
title_full Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
title_fullStr Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
title_full_unstemmed Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
title_short Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
title_sort inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
topic Reinforcement learning
Inverse learning
Mechanical ventilation
Sedative dosing
Intensive care units
url http://link.springer.com/article/10.1186/s12911-019-0763-6
work_keys_str_mv AT chaoyu inversereinforcementlearningforintelligentmechanicalventilationandsedativedosinginintensivecareunits
AT jimingliu inversereinforcementlearningforintelligentmechanicalventilationandsedativedosinginintensivecareunits
AT hongyizhao inversereinforcementlearningforintelligentmechanicalventilationandsedativedosinginintensivecareunits