An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis

Abstract Early identification of high-risk metabolic dysfunction-associated steatohepatitis (MASH) can offer patients access to novel therapeutic options and potentially decrease the risk of progression to cirrhosis. This study aimed to develop an explainable machine learning model for high-risk MAS...

Full description

Bibliographic Details
Main Authors: Basile Njei, Eri Osta, Nelvis Njei, Yazan A. Al-Ajlouni, Joseph K. Lim
Format: Article
Language:English
Published: Nature Portfolio 2024-04-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-024-59183-4
_version_ 1797209445657214976
author Basile Njei
Eri Osta
Nelvis Njei
Yazan A. Al-Ajlouni
Joseph K. Lim
author_facet Basile Njei
Eri Osta
Nelvis Njei
Yazan A. Al-Ajlouni
Joseph K. Lim
author_sort Basile Njei
collection DOAJ
description Abstract Early identification of high-risk metabolic dysfunction-associated steatohepatitis (MASH) can offer patients access to novel therapeutic options and potentially decrease the risk of progression to cirrhosis. This study aimed to develop an explainable machine learning model for high-risk MASH prediction and compare its performance with well-established biomarkers. Data were derived from the National Health and Nutrition Examination Surveys (NHANES) 2017-March 2020, which included a total of 5281 adults with valid elastography measurements. We used a FAST score ≥ 0.35, calculated using liver stiffness measurement and controlled attenuation parameter values and aspartate aminotransferase levels, to identify individuals with high-risk MASH. We developed an ensemble-based machine learning XGBoost model to detect high-risk MASH and explored the model’s interpretability using an explainable artificial intelligence SHAP method. The prevalence of high-risk MASH was 6.9%. Our XGBoost model achieved a high level of sensitivity (0.82), specificity (0.91), accuracy (0.90), and AUC (0.95) for identifying high-risk MASH. Our model demonstrated a superior ability to predict high-risk MASH vs. FIB-4, APRI, BARD, and MASLD fibrosis scores (AUC of 0.95 vs. 0.50, 0.50, 0.49 and 0.50, respectively). To explain the high performance of our model, we found that the top 5 predictors of high-risk MASH were ALT, GGT, platelet count, waist circumference, and age. We used an explainable ML approach to develop a clinically applicable model that outperforms commonly used clinical risk indices and could increase the identification of high-risk MASH patients in resource-limited settings.
first_indexed 2024-04-24T09:54:49Z
format Article
id doaj.art-e1e312dd308649e99ddfa84b73b8c4b1
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-24T09:54:49Z
publishDate 2024-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-e1e312dd308649e99ddfa84b73b8c4b12024-04-14T11:13:33ZengNature PortfolioScientific Reports2045-23222024-04-011411910.1038/s41598-024-59183-4An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitisBasile Njei0Eri Osta1Nelvis Njei2Yazan A. Al-Ajlouni3Joseph K. Lim4Section of Digestive Diseases, Yale School of MedicineUniversity of Texas Health San AntonioCenters for Medicare and Medicaid ServicesSchool of Medicine, New York Medical CollegeSection of Digestive Diseases, Yale School of MedicineAbstract Early identification of high-risk metabolic dysfunction-associated steatohepatitis (MASH) can offer patients access to novel therapeutic options and potentially decrease the risk of progression to cirrhosis. This study aimed to develop an explainable machine learning model for high-risk MASH prediction and compare its performance with well-established biomarkers. Data were derived from the National Health and Nutrition Examination Surveys (NHANES) 2017-March 2020, which included a total of 5281 adults with valid elastography measurements. We used a FAST score ≥ 0.35, calculated using liver stiffness measurement and controlled attenuation parameter values and aspartate aminotransferase levels, to identify individuals with high-risk MASH. We developed an ensemble-based machine learning XGBoost model to detect high-risk MASH and explored the model’s interpretability using an explainable artificial intelligence SHAP method. The prevalence of high-risk MASH was 6.9%. Our XGBoost model achieved a high level of sensitivity (0.82), specificity (0.91), accuracy (0.90), and AUC (0.95) for identifying high-risk MASH. Our model demonstrated a superior ability to predict high-risk MASH vs. FIB-4, APRI, BARD, and MASLD fibrosis scores (AUC of 0.95 vs. 0.50, 0.50, 0.49 and 0.50, respectively). To explain the high performance of our model, we found that the top 5 predictors of high-risk MASH were ALT, GGT, platelet count, waist circumference, and age. We used an explainable ML approach to develop a clinically applicable model that outperforms commonly used clinical risk indices and could increase the identification of high-risk MASH patients in resource-limited settings.https://doi.org/10.1038/s41598-024-59183-4
spellingShingle Basile Njei
Eri Osta
Nelvis Njei
Yazan A. Al-Ajlouni
Joseph K. Lim
An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis
Scientific Reports
title An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis
title_full An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis
title_fullStr An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis
title_full_unstemmed An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis
title_short An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis
title_sort explainable machine learning model for prediction of high risk nonalcoholic steatohepatitis
url https://doi.org/10.1038/s41598-024-59183-4
work_keys_str_mv AT basilenjei anexplainablemachinelearningmodelforpredictionofhighrisknonalcoholicsteatohepatitis
AT eriosta anexplainablemachinelearningmodelforpredictionofhighrisknonalcoholicsteatohepatitis
AT nelvisnjei anexplainablemachinelearningmodelforpredictionofhighrisknonalcoholicsteatohepatitis
AT yazanaalajlouni anexplainablemachinelearningmodelforpredictionofhighrisknonalcoholicsteatohepatitis
AT josephklim anexplainablemachinelearningmodelforpredictionofhighrisknonalcoholicsteatohepatitis
AT basilenjei explainablemachinelearningmodelforpredictionofhighrisknonalcoholicsteatohepatitis
AT eriosta explainablemachinelearningmodelforpredictionofhighrisknonalcoholicsteatohepatitis
AT nelvisnjei explainablemachinelearningmodelforpredictionofhighrisknonalcoholicsteatohepatitis
AT yazanaalajlouni explainablemachinelearningmodelforpredictionofhighrisknonalcoholicsteatohepatitis
AT josephklim explainablemachinelearningmodelforpredictionofhighrisknonalcoholicsteatohepatitis