Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation Learning

Dynamic Treatment Regimes (DTRs) are sets of sequential decision rules that can be adapted over time to treat patients with a specific pathology. DTR consists of alternative treatment paths and any of these treatments can be adapted depending on the patient's characteristics. Reinforcement Lear...

Full description

Bibliographic Details
Main Authors: Syed Ihtesham Hussain Shah, Antonio Coronato, Muddasar Naeem, Giuseppe De Pietro
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9837927/
_version_ 1811344615581679616
author Syed Ihtesham Hussain Shah
Antonio Coronato
Muddasar Naeem
Giuseppe De Pietro
author_facet Syed Ihtesham Hussain Shah
Antonio Coronato
Muddasar Naeem
Giuseppe De Pietro
author_sort Syed Ihtesham Hussain Shah
collection DOAJ
description Dynamic Treatment Regimes (DTRs) are sets of sequential decision rules that can be adapted over time to treat patients with a specific pathology. DTR consists of alternative treatment paths and any of these treatments can be adapted depending on the patient's characteristics. Reinforcement Learning (RL) and Imitation Learning (IL) approaches have been deployed for obtaining optimal treatment for a patient but, these approaches rely only on positive trajectories (i.e., treatments that concluded with positive responses of the patient). In contrast, negative trajectories (i.e., samples of non-responding treatments) are discarded, although these have valuable information content. We propose a Cooperative Imitation Learning (CIL) method that exploits information from both negative and positive trajectories to learn the optimal DTR. The proposed method reduces the chance of selecting any treatment which results in a negative outcome (negative response of the patient) during the medical examination. To validate our approach, we have considered a well-known DTR which is defined for the treatment of patients with alcohol addiction. Results show that our approach outperforms those that rely only on positive trajectories.
first_indexed 2024-04-13T19:49:50Z
format Article
id doaj.art-2f40d1c4726f4744888b35b48f68d6ad
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-13T19:49:50Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-2f40d1c4726f4744888b35b48f68d6ad2022-12-22T02:32:34ZengIEEEIEEE Access2169-35362022-01-0110781487815810.1109/ACCESS.2022.31934949837927Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation LearningSyed Ihtesham Hussain Shah0https://orcid.org/0000-0002-6390-1864Antonio Coronato1https://orcid.org/0000-0001-8177-032XMuddasar Naeem2https://orcid.org/0000-0003-0815-4883Giuseppe De Pietro3CNR, Institute for High Performance Computing and Networking (ICAR), Napoli, ItalyCNR, Institute for High Performance Computing and Networking (ICAR), Napoli, ItalyCNR, Institute for High Performance Computing and Networking (ICAR), Napoli, ItalyCNR, Institute for High Performance Computing and Networking (ICAR), Napoli, ItalyDynamic Treatment Regimes (DTRs) are sets of sequential decision rules that can be adapted over time to treat patients with a specific pathology. DTR consists of alternative treatment paths and any of these treatments can be adapted depending on the patient's characteristics. Reinforcement Learning (RL) and Imitation Learning (IL) approaches have been deployed for obtaining optimal treatment for a patient but, these approaches rely only on positive trajectories (i.e., treatments that concluded with positive responses of the patient). In contrast, negative trajectories (i.e., samples of non-responding treatments) are discarded, although these have valuable information content. We propose a Cooperative Imitation Learning (CIL) method that exploits information from both negative and positive trajectories to learn the optimal DTR. The proposed method reduces the chance of selecting any treatment which results in a negative outcome (negative response of the patient) during the medical examination. To validate our approach, we have considered a well-known DTR which is defined for the treatment of patients with alcohol addiction. Results show that our approach outperforms those that rely only on positive trajectories.https://ieeexplore.ieee.org/document/9837927/Inverse reinforcement learningimitation learningdynamic treatment regimereinforcement learningcooperative imitation learningMarkov decision process
spellingShingle Syed Ihtesham Hussain Shah
Antonio Coronato
Muddasar Naeem
Giuseppe De Pietro
Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation Learning
IEEE Access
Inverse reinforcement learning
imitation learning
dynamic treatment regime
reinforcement learning
cooperative imitation learning
Markov decision process
title Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation Learning
title_full Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation Learning
title_fullStr Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation Learning
title_full_unstemmed Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation Learning
title_short Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation Learning
title_sort learning and assessing optimal dynamic treatment regimes through cooperative imitation learning
topic Inverse reinforcement learning
imitation learning
dynamic treatment regime
reinforcement learning
cooperative imitation learning
Markov decision process
url https://ieeexplore.ieee.org/document/9837927/
work_keys_str_mv AT syedihteshamhussainshah learningandassessingoptimaldynamictreatmentregimesthroughcooperativeimitationlearning
AT antoniocoronato learningandassessingoptimaldynamictreatmentregimesthroughcooperativeimitationlearning
AT muddasarnaeem learningandassessingoptimaldynamictreatmentregimesthroughcooperativeimitationlearning
AT giuseppedepietro learningandassessingoptimaldynamictreatmentregimesthroughcooperativeimitationlearning