Explainable machine learning prediction of ICU mortality

Background: There is a variety of mortality prediction models for patients in intensive care units (ICU) to guide appropriate clinical management. Advances in machine learning methodologies typically employ classifiers such as Neural Network and Random Forest which are often regarded by healthcare p...

Full description

Bibliographic Details
Main Authors:	Alvin Har Teck Chia, May Sze Khoo, Andy Zhengyi Lim, Kian Eng Ong, Yixuan Sun, Binh P. Nguyen, Matthew Chin Heng Chua, Junxiong Pang
Format:	Article
Language:	English
Published:	Elsevier 2021-01-01
Series:	Informatics in Medicine Unlocked
Subjects:	Mortality prediction Explainable machine learning ICU Cox-proportional hazards Feature selection
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352914821001593

_version_	1819116575928614912
author	Alvin Har Teck Chia May Sze Khoo Andy Zhengyi Lim Kian Eng Ong Yixuan Sun Binh P. Nguyen Matthew Chin Heng Chua Junxiong Pang
author_facet	Alvin Har Teck Chia May Sze Khoo Andy Zhengyi Lim Kian Eng Ong Yixuan Sun Binh P. Nguyen Matthew Chin Heng Chua Junxiong Pang
author_sort	Alvin Har Teck Chia
collection	DOAJ
description	Background: There is a variety of mortality prediction models for patients in intensive care units (ICU) to guide appropriate clinical management. Advances in machine learning methodologies typically employ classifiers such as Neural Network and Random Forest which are often regarded by healthcare professionals as black boxes. These models often do not provide clear links between the input model features and output clinical event. We investigate whether features identified by Cox-Proportional Hazards (CPH) model can be used for ICU mortality prediction. Methods: We employ the PhysioNet Challenge 2012 dataset, a subset of MIMIC-II Clinical Database data of ICU patients admitted to Boston's Beth Israel Deaconess Medical Center from 2001 to 2008. The dataset is split into train set A, test set B and unseen set C, with 4000 patients each. Python is the programming language used alongside scikit-learn, and lifelines packages. Besides white-box feature selection methods (logistic regression and decision tree), we also explore using Cox-Proportional Hazards model for feature selection. We then trained the machine learning model using classifiers such as logistic regression and variants of decision tree. Extreme gradient boosted trees models performed better than other classifiers. The model is validated using 5-fold cross-validation and evaluated against unseen set C. The model performance is assessed using area under the precision-recall curve (AUC-PR) as the main metric. Findings: The data of about 12,000 patients is used, providing a high degree of generalizability. The number of statistically significant features identified by CPH (n = 16) is significantly smaller than logistic regression (n = 36), decision tree (n = 26) and all features (n = 42). With only 16 features used, the model achieves a performance of AUC-PR 0·438 on test set B, which is close to decision tree (AUC-PR 0·442) and logistic regression (AUC-PR 0·446) and all features (AUC-PR 0·446). Interpretation: The significantly fewer features identified by CPH allows the building of a model that is easily interpretable by clinicians whilst still achieving comparable results to other models. This finding allows clinicians to use CPH as an alternative method to determine and act on features that need to be closely monitored for ICU patients.
first_indexed	2024-12-22T05:19:16Z
format	Article
id	doaj.art-1ba0b64dabb743b9a7298a3078b5caa3
institution	Directory Open Access Journal
issn	2352-9148
language	English
last_indexed	2024-12-22T05:19:16Z
publishDate	2021-01-01
publisher	Elsevier
record_format	Article
series	Informatics in Medicine Unlocked
spelling	doaj.art-1ba0b64dabb743b9a7298a3078b5caa32022-12-21T18:37:46ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0125100674Explainable machine learning prediction of ICU mortalityAlvin Har Teck Chia0May Sze Khoo1Andy Zhengyi Lim2Kian Eng Ong3Yixuan Sun4Binh P. Nguyen5Matthew Chin Heng Chua6Junxiong Pang7Institute of Systems Science, National University of Singapore, SingaporeInstitute of Systems Science, National University of Singapore, SingaporeInstitute of Systems Science, National University of Singapore, SingaporeInstitute of Systems Science, National University of Singapore, SingaporeInstitute of Systems Science, National University of Singapore, SingaporeSchool of Mathematics and Statistics, Victoria University of Wellington, New Zealand; Corresponding author.Institute of Systems Science, National University of Singapore, Singapore; Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore; Centre for Infectious Disease Epidemiology and Research, Singapore; Corresponding author. Institute of Systems Science, National University of Singapore, Singapore.Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore; Centre for Infectious Disease Epidemiology and Research, Singapore; Corresponding author. Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore.Background: There is a variety of mortality prediction models for patients in intensive care units (ICU) to guide appropriate clinical management. Advances in machine learning methodologies typically employ classifiers such as Neural Network and Random Forest which are often regarded by healthcare professionals as black boxes. These models often do not provide clear links between the input model features and output clinical event. We investigate whether features identified by Cox-Proportional Hazards (CPH) model can be used for ICU mortality prediction. Methods: We employ the PhysioNet Challenge 2012 dataset, a subset of MIMIC-II Clinical Database data of ICU patients admitted to Boston's Beth Israel Deaconess Medical Center from 2001 to 2008. The dataset is split into train set A, test set B and unseen set C, with 4000 patients each. Python is the programming language used alongside scikit-learn, and lifelines packages. Besides white-box feature selection methods (logistic regression and decision tree), we also explore using Cox-Proportional Hazards model for feature selection. We then trained the machine learning model using classifiers such as logistic regression and variants of decision tree. Extreme gradient boosted trees models performed better than other classifiers. The model is validated using 5-fold cross-validation and evaluated against unseen set C. The model performance is assessed using area under the precision-recall curve (AUC-PR) as the main metric. Findings: The data of about 12,000 patients is used, providing a high degree of generalizability. The number of statistically significant features identified by CPH (n = 16) is significantly smaller than logistic regression (n = 36), decision tree (n = 26) and all features (n = 42). With only 16 features used, the model achieves a performance of AUC-PR 0·438 on test set B, which is close to decision tree (AUC-PR 0·442) and logistic regression (AUC-PR 0·446) and all features (AUC-PR 0·446). Interpretation: The significantly fewer features identified by CPH allows the building of a model that is easily interpretable by clinicians whilst still achieving comparable results to other models. This finding allows clinicians to use CPH as an alternative method to determine and act on features that need to be closely monitored for ICU patients.http://www.sciencedirect.com/science/article/pii/S2352914821001593Mortality predictionExplainable machine learningICUCox-proportional hazardsFeature selection
spellingShingle	Alvin Har Teck Chia May Sze Khoo Andy Zhengyi Lim Kian Eng Ong Yixuan Sun Binh P. Nguyen Matthew Chin Heng Chua Junxiong Pang Explainable machine learning prediction of ICU mortality Informatics in Medicine Unlocked Mortality prediction Explainable machine learning ICU Cox-proportional hazards Feature selection
title	Explainable machine learning prediction of ICU mortality
title_full	Explainable machine learning prediction of ICU mortality
title_fullStr	Explainable machine learning prediction of ICU mortality
title_full_unstemmed	Explainable machine learning prediction of ICU mortality
title_short	Explainable machine learning prediction of ICU mortality
title_sort	explainable machine learning prediction of icu mortality
topic	Mortality prediction Explainable machine learning ICU Cox-proportional hazards Feature selection
url	http://www.sciencedirect.com/science/article/pii/S2352914821001593
work_keys_str_mv	AT alvinharteckchia explainablemachinelearningpredictionoficumortality AT mayszekhoo explainablemachinelearningpredictionoficumortality AT andyzhengyilim explainablemachinelearningpredictionoficumortality AT kianengong explainablemachinelearningpredictionoficumortality AT yixuansun explainablemachinelearningpredictionoficumortality AT binhpnguyen explainablemachinelearningpredictionoficumortality AT matthewchinhengchua explainablemachinelearningpredictionoficumortality AT junxiongpang explainablemachinelearningpredictionoficumortality

Explainable machine learning prediction of ICU mortality

Similar Items