Detection of the chronic kidney disease using XGBoost classifier and explaining the influence of the attributes on the model using SHAP

Abstract Chronic kidney disease (CKD) is a condition distinguished by structural and functional changes to the kidney over time. Studies show that 10% of adults worldwide are affected by some kind of CKD, resulting in 1.2 million deaths. Recently, CKD has emerged as a leading cause of mortality worl...

Full description

Bibliographic Details
Main Authors: Md. Johir Raihan, Md. Al-Masrur Khan, Seong-Hoon Kee, Abdullah-Al Nahid
Format: Article
Language:English
Published: Nature Portfolio 2023-04-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-33525-0
_version_ 1797841068703612928
author Md. Johir Raihan
Md. Al-Masrur Khan
Seong-Hoon Kee
Abdullah-Al Nahid
author_facet Md. Johir Raihan
Md. Al-Masrur Khan
Seong-Hoon Kee
Abdullah-Al Nahid
author_sort Md. Johir Raihan
collection DOAJ
description Abstract Chronic kidney disease (CKD) is a condition distinguished by structural and functional changes to the kidney over time. Studies show that 10% of adults worldwide are affected by some kind of CKD, resulting in 1.2 million deaths. Recently, CKD has emerged as a leading cause of mortality worldwide, making it necessary to develop a Computer-Aided Diagnostic (CAD) system to diagnose CKD automatically. Machine Learning (ML) based CAD system can be used by a clinician to automatically diagnoses mass people. Since ML models are considered a black box, it is also necessary to expose influential causes behind a model's prediction of a particular output. So that, a doctor can make a more rational decision based on the model's output and analysis of the features influence on the model. In this paper, we have used the XGBoost as the ML classifier to predict whether a patient has CKD or not. Using the XGBoost classifier, we have obtained an accuracy, precision, recall, and F1 score of $$99.16{\%}, 100{\%}, 98.68{\%},$$ 99.16 % , 100 % , 98.68 % , and $$99.33{\%},$$ 99.33 % , respectively using all $$24$$ 24 features. Furthermore, we have used Biogeography Based Optimization (BBO) algorithm to find an effective subset of the features. The BBO algorithm selected almost half of the initial features. We have obtained an accuracy, precision, recall, and F1 score of $$98.33{\%}, 100{\%}, 97.36{\%},$$ 98.33 % , 100 % , 97.36 % , and $$98.67{\%},$$ 98.67 % , respectively using only 13 features selected by the BBO algorithm. Finally, we have explained the impact of the feature on the ML models using the SHapley Additive exPlanations (SHAP) analysis. Using SHAP analysis and BBO algorithm, we have found that hemoglobin and albumin mostly contribute to the detection of CKD.
first_indexed 2024-04-09T16:25:03Z
format Article
id doaj.art-470b63656ab84604934c42eb2ef29a13
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-09T16:25:03Z
publishDate 2023-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-470b63656ab84604934c42eb2ef29a132023-04-23T11:16:03ZengNature PortfolioScientific Reports2045-23222023-04-0113111510.1038/s41598-023-33525-0Detection of the chronic kidney disease using XGBoost classifier and explaining the influence of the attributes on the model using SHAPMd. Johir Raihan0Md. Al-Masrur Khan1Seong-Hoon Kee2Abdullah-Al Nahid3Electronics and Communication Engineering Discipline, Khulna UniversityDepartment of ICT Integrated Ocean Smart Cities Engineering, Dong-A UniversityDepartment of ICT Integrated Ocean Smart Cities Engineering, Dong-A UniversityElectronics and Communication Engineering Discipline, Khulna UniversityAbstract Chronic kidney disease (CKD) is a condition distinguished by structural and functional changes to the kidney over time. Studies show that 10% of adults worldwide are affected by some kind of CKD, resulting in 1.2 million deaths. Recently, CKD has emerged as a leading cause of mortality worldwide, making it necessary to develop a Computer-Aided Diagnostic (CAD) system to diagnose CKD automatically. Machine Learning (ML) based CAD system can be used by a clinician to automatically diagnoses mass people. Since ML models are considered a black box, it is also necessary to expose influential causes behind a model's prediction of a particular output. So that, a doctor can make a more rational decision based on the model's output and analysis of the features influence on the model. In this paper, we have used the XGBoost as the ML classifier to predict whether a patient has CKD or not. Using the XGBoost classifier, we have obtained an accuracy, precision, recall, and F1 score of $$99.16{\%}, 100{\%}, 98.68{\%},$$ 99.16 % , 100 % , 98.68 % , and $$99.33{\%},$$ 99.33 % , respectively using all $$24$$ 24 features. Furthermore, we have used Biogeography Based Optimization (BBO) algorithm to find an effective subset of the features. The BBO algorithm selected almost half of the initial features. We have obtained an accuracy, precision, recall, and F1 score of $$98.33{\%}, 100{\%}, 97.36{\%},$$ 98.33 % , 100 % , 97.36 % , and $$98.67{\%},$$ 98.67 % , respectively using only 13 features selected by the BBO algorithm. Finally, we have explained the impact of the feature on the ML models using the SHapley Additive exPlanations (SHAP) analysis. Using SHAP analysis and BBO algorithm, we have found that hemoglobin and albumin mostly contribute to the detection of CKD.https://doi.org/10.1038/s41598-023-33525-0
spellingShingle Md. Johir Raihan
Md. Al-Masrur Khan
Seong-Hoon Kee
Abdullah-Al Nahid
Detection of the chronic kidney disease using XGBoost classifier and explaining the influence of the attributes on the model using SHAP
Scientific Reports
title Detection of the chronic kidney disease using XGBoost classifier and explaining the influence of the attributes on the model using SHAP
title_full Detection of the chronic kidney disease using XGBoost classifier and explaining the influence of the attributes on the model using SHAP
title_fullStr Detection of the chronic kidney disease using XGBoost classifier and explaining the influence of the attributes on the model using SHAP
title_full_unstemmed Detection of the chronic kidney disease using XGBoost classifier and explaining the influence of the attributes on the model using SHAP
title_short Detection of the chronic kidney disease using XGBoost classifier and explaining the influence of the attributes on the model using SHAP
title_sort detection of the chronic kidney disease using xgboost classifier and explaining the influence of the attributes on the model using shap
url https://doi.org/10.1038/s41598-023-33525-0
work_keys_str_mv AT mdjohirraihan detectionofthechronickidneydiseaseusingxgboostclassifierandexplainingtheinfluenceoftheattributesonthemodelusingshap
AT mdalmasrurkhan detectionofthechronickidneydiseaseusingxgboostclassifierandexplainingtheinfluenceoftheattributesonthemodelusingshap
AT seonghoonkee detectionofthechronickidneydiseaseusingxgboostclassifierandexplainingtheinfluenceoftheattributesonthemodelusingshap
AT abdullahalnahid detectionofthechronickidneydiseaseusingxgboostclassifierandexplainingtheinfluenceoftheattributesonthemodelusingshap