Interpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, Iran

Abstract Background The high number of COVID-19 deaths is a serious threat to the world. Demographic and clinical biomarkers are significantly associated with the mortality risk of this disease. This study aimed to implement Generalized Neural Additive Model (GNAM) as an interpretable machine learni...

Full description

Bibliographic Details
Main Authors: Samad Moslehi, Hossein Mahjub, Maryam Farhadian, Ali Reza Soltanian, Mojgan Mamani
Format: Article
Language:English
Published: BMC 2022-12-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:https://doi.org/10.1186/s12874-022-01827-y
_version_ 1797973553564352512
author Samad Moslehi
Hossein Mahjub
Maryam Farhadian
Ali Reza Soltanian
Mojgan Mamani
author_facet Samad Moslehi
Hossein Mahjub
Maryam Farhadian
Ali Reza Soltanian
Mojgan Mamani
author_sort Samad Moslehi
collection DOAJ
description Abstract Background The high number of COVID-19 deaths is a serious threat to the world. Demographic and clinical biomarkers are significantly associated with the mortality risk of this disease. This study aimed to implement Generalized Neural Additive Model (GNAM) as an interpretable machine learning method to predict the COVID-19 mortality of patients. Methods This cohort study included 2181 COVID-19 patients admitted from February 2020 to July 2021 in Sina and Besat hospitals in Hamadan, west of Iran. A total of 22 baseline features including patients' demographic information and clinical biomarkers were collected. Four strategies including removing missing values, mean, K-Nearest Neighbor (KNN), and Multivariate Imputation by Chained Equations (MICE) imputation methods were used to deal with missing data. Firstly, the important features for predicting binary outcome (1: death, 0: recovery) were selected using the Random Forest (RF) method. Also, synthetic minority over-sampling technique (SMOTE) method was used for handling imbalanced data. Next, considering the selected features, the predictive performance of GNAM for predicting mortality outcome was compared with logistic regression, RF, generalized additive model (GAMs), gradient boosting decision tree (GBDT), and deep neural networks (DNNs) classification models. Each model trained on fifty different subsets of a train-test dataset to ensure a model performance. The average accuracy, F1-score and area under the curve (AUC) evaluation indices were used for comparison of the predictive performance of the models. Results Out of the 2181 COVID-19 patients, 624 died during hospitalization and 1557 recovered. The missing rate was 3 percent for each patient. The mean age of dead patients (71.17 ± 14.44 years) was statistically significant higher than recovered patients (58.25 ± 16.52 years). Based on RF, 10 features with the highest relative importance were selected as the best influential features; including blood urea nitrogen (BUN), lymphocytes (Lym), age, blood sugar (BS), serum glutamic-oxaloacetic transaminase (SGOT), monocytes (Mono), blood creatinine (CR), neutrophils (NUT), alkaline phosphatase (ALP) and hematocrit (HCT). The results of predictive performance comparisons showed GNAM with the mean accuracy, F1-score, and mean AUC in the test dataset of 0.847, 0.691, and 0.774, respectively, had the best performance. The smooth function graphs learned from the GNAM were descending for the Lym and ascending for the other important features. Conclusions Interpretable GNAM can perform well in predicting the mortality of COVID-19 patients. Therefore, the use of such a reliable model can help physicians to prioritize some important demographic and clinical biomarkers by identifying the effective features and the type of predictive trend in disease progression.
first_indexed 2024-04-11T04:06:06Z
format Article
id doaj.art-353678e8e4aa479bb60b45dc40361abf
institution Directory Open Access Journal
issn 1471-2288
language English
last_indexed 2024-04-11T04:06:06Z
publishDate 2022-12-01
publisher BMC
record_format Article
series BMC Medical Research Methodology
spelling doaj.art-353678e8e4aa479bb60b45dc40361abf2023-01-01T12:22:17ZengBMCBMC Medical Research Methodology1471-22882022-12-0122111410.1186/s12874-022-01827-yInterpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, IranSamad Moslehi0Hossein Mahjub1Maryam Farhadian2Ali Reza Soltanian3Mojgan Mamani4Department of Biostatistics, School of Public Health, Hamadan University of Medical SciencesDepartment of Biostatistics, School of Public Health, Research Center for Health Sciences, Hamadan University of Medical SciencesDepartment of Biostatistics, School of Public Health, Research Center for Health Sciences, Hamadan University of Medical SciencesDepartment of Biostatistics, School of Public Health, Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical SciencesBrucellosis Research Center, Hamadan University of Medical SciencesAbstract Background The high number of COVID-19 deaths is a serious threat to the world. Demographic and clinical biomarkers are significantly associated with the mortality risk of this disease. This study aimed to implement Generalized Neural Additive Model (GNAM) as an interpretable machine learning method to predict the COVID-19 mortality of patients. Methods This cohort study included 2181 COVID-19 patients admitted from February 2020 to July 2021 in Sina and Besat hospitals in Hamadan, west of Iran. A total of 22 baseline features including patients' demographic information and clinical biomarkers were collected. Four strategies including removing missing values, mean, K-Nearest Neighbor (KNN), and Multivariate Imputation by Chained Equations (MICE) imputation methods were used to deal with missing data. Firstly, the important features for predicting binary outcome (1: death, 0: recovery) were selected using the Random Forest (RF) method. Also, synthetic minority over-sampling technique (SMOTE) method was used for handling imbalanced data. Next, considering the selected features, the predictive performance of GNAM for predicting mortality outcome was compared with logistic regression, RF, generalized additive model (GAMs), gradient boosting decision tree (GBDT), and deep neural networks (DNNs) classification models. Each model trained on fifty different subsets of a train-test dataset to ensure a model performance. The average accuracy, F1-score and area under the curve (AUC) evaluation indices were used for comparison of the predictive performance of the models. Results Out of the 2181 COVID-19 patients, 624 died during hospitalization and 1557 recovered. The missing rate was 3 percent for each patient. The mean age of dead patients (71.17 ± 14.44 years) was statistically significant higher than recovered patients (58.25 ± 16.52 years). Based on RF, 10 features with the highest relative importance were selected as the best influential features; including blood urea nitrogen (BUN), lymphocytes (Lym), age, blood sugar (BS), serum glutamic-oxaloacetic transaminase (SGOT), monocytes (Mono), blood creatinine (CR), neutrophils (NUT), alkaline phosphatase (ALP) and hematocrit (HCT). The results of predictive performance comparisons showed GNAM with the mean accuracy, F1-score, and mean AUC in the test dataset of 0.847, 0.691, and 0.774, respectively, had the best performance. The smooth function graphs learned from the GNAM were descending for the Lym and ascending for the other important features. Conclusions Interpretable GNAM can perform well in predicting the mortality of COVID-19 patients. Therefore, the use of such a reliable model can help physicians to prioritize some important demographic and clinical biomarkers by identifying the effective features and the type of predictive trend in disease progression.https://doi.org/10.1186/s12874-022-01827-yCOVID-19Feature selectionLaboratory markersMachine learningGeneralized neural additivePrediction
spellingShingle Samad Moslehi
Hossein Mahjub
Maryam Farhadian
Ali Reza Soltanian
Mojgan Mamani
Interpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, Iran
BMC Medical Research Methodology
COVID-19
Feature selection
Laboratory markers
Machine learning
Generalized neural additive
Prediction
title Interpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, Iran
title_full Interpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, Iran
title_fullStr Interpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, Iran
title_full_unstemmed Interpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, Iran
title_short Interpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, Iran
title_sort interpretable generalized neural additive models for mortality prediction of covid 19 hospitalized patients in hamadan iran
topic COVID-19
Feature selection
Laboratory markers
Machine learning
Generalized neural additive
Prediction
url https://doi.org/10.1186/s12874-022-01827-y
work_keys_str_mv AT samadmoslehi interpretablegeneralizedneuraladditivemodelsformortalitypredictionofcovid19hospitalizedpatientsinhamadaniran
AT hosseinmahjub interpretablegeneralizedneuraladditivemodelsformortalitypredictionofcovid19hospitalizedpatientsinhamadaniran
AT maryamfarhadian interpretablegeneralizedneuraladditivemodelsformortalitypredictionofcovid19hospitalizedpatientsinhamadaniran
AT alirezasoltanian interpretablegeneralizedneuraladditivemodelsformortalitypredictionofcovid19hospitalizedpatientsinhamadaniran
AT mojganmamani interpretablegeneralizedneuraladditivemodelsformortalitypredictionofcovid19hospitalizedpatientsinhamadaniran