Ensemble machine learning prediction of hyperuricemia based on a prospective health checkup population

Objectives: An accurate prediction model for hyperuricemia (HUA) in adults remain unavailable. This study aimed to develop a stacking ensemble prediction model for HUA to identify high-risk groups and explore risk factors.Methods: A prospective health checkup cohort of 40899 subjects was examined an...

Full description

Bibliographic Details
Main Authors: Yongsheng Zhang, Li Zhang, Haoyue Lv, Guang Zhang
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-04-01
Series:Frontiers in Physiology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fphys.2024.1357404/full
_version_ 1797214687665848320
author Yongsheng Zhang
Yongsheng Zhang
Yongsheng Zhang
Li Zhang
Haoyue Lv
Haoyue Lv
Haoyue Lv
Guang Zhang
Guang Zhang
Guang Zhang
author_facet Yongsheng Zhang
Yongsheng Zhang
Yongsheng Zhang
Li Zhang
Haoyue Lv
Haoyue Lv
Haoyue Lv
Guang Zhang
Guang Zhang
Guang Zhang
author_sort Yongsheng Zhang
collection DOAJ
description Objectives: An accurate prediction model for hyperuricemia (HUA) in adults remain unavailable. This study aimed to develop a stacking ensemble prediction model for HUA to identify high-risk groups and explore risk factors.Methods: A prospective health checkup cohort of 40899 subjects was examined and randomly divided into the training and validation sets with the ratio of 7:3. LASSO regression was employed to screen out important features and then the ROSE sampling was used to handle the imbalanced classes. An ensemble model using stacking strategy was constructed based on three individual models, including support vector machine, decision tree C5.0, and eXtreme gradient boosting. Model validations were conducted using the area under the receiver operating characteristic curve (AUC) and the calibration curve, as well as metrics including accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score. A model agnostic instance level variable attributions technique (iBreakdown) was used to illustrate the black-box nature of our ensemble model, and to identify contributing risk factors.Results: Fifteen important features were screened out of 23 clinical variables. Our stacking ensemble model with an AUC of 0.854, outperformed the other three models, support vector machine, decision tree C5.0, and eXtreme gradient boosting with AUCs of 0.848, 0.851 and 0.849 respectively. Calibration accuracy as well as other metrics including accuracy, specificity, negative predictive value, and F1 score were also proved our ensemble model’s superiority. The contributing risk factors were estimated using six randomly selected subjects, which showed that being female and relatively younger, together with having higher baseline uric acid, body mass index, γ-glutamyl transpeptidase, total protein, triglycerides, creatinine, and fasting blood glucose can increase the risk of HUA. To further validate our model’s applicability in the health checkup population, we used another cohort of 8559 subjects that also showed our ensemble prediction model had favorable performances with an AUC of 0.846.Conclusion: In this study, the stacking ensemble prediction model for HUA was developed, and it outperformed three individual models that compose it (support vector machine, decision tree C5.0, and eXtreme gradient boosting). The contributing risk factors were identified with insightful ideas.
first_indexed 2024-04-24T11:18:08Z
format Article
id doaj.art-713a26a6dd124a5a9432d093c1ce677b
institution Directory Open Access Journal
issn 1664-042X
language English
last_indexed 2024-04-24T11:18:08Z
publishDate 2024-04-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Physiology
spelling doaj.art-713a26a6dd124a5a9432d093c1ce677b2024-04-11T05:08:59ZengFrontiers Media S.A.Frontiers in Physiology1664-042X2024-04-011510.3389/fphys.2024.13574041357404Ensemble machine learning prediction of hyperuricemia based on a prospective health checkup populationYongsheng Zhang0Yongsheng Zhang1Yongsheng Zhang2Li Zhang3Haoyue Lv4Haoyue Lv5Haoyue Lv6Guang Zhang7Guang Zhang8Guang Zhang9Health Management Center, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, ChinaInstitute of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, ChinaShandong Engineering Laboratory of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, ChinaDepartment of Pharmacology, Jinan Central Hospital Affiliated to Shandong First Medical University, Jinan, ChinaHealth Management Center, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, ChinaInstitute of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, ChinaShandong Engineering Laboratory of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, ChinaHealth Management Center, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, ChinaInstitute of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, ChinaShandong Engineering Laboratory of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, ChinaObjectives: An accurate prediction model for hyperuricemia (HUA) in adults remain unavailable. This study aimed to develop a stacking ensemble prediction model for HUA to identify high-risk groups and explore risk factors.Methods: A prospective health checkup cohort of 40899 subjects was examined and randomly divided into the training and validation sets with the ratio of 7:3. LASSO regression was employed to screen out important features and then the ROSE sampling was used to handle the imbalanced classes. An ensemble model using stacking strategy was constructed based on three individual models, including support vector machine, decision tree C5.0, and eXtreme gradient boosting. Model validations were conducted using the area under the receiver operating characteristic curve (AUC) and the calibration curve, as well as metrics including accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score. A model agnostic instance level variable attributions technique (iBreakdown) was used to illustrate the black-box nature of our ensemble model, and to identify contributing risk factors.Results: Fifteen important features were screened out of 23 clinical variables. Our stacking ensemble model with an AUC of 0.854, outperformed the other three models, support vector machine, decision tree C5.0, and eXtreme gradient boosting with AUCs of 0.848, 0.851 and 0.849 respectively. Calibration accuracy as well as other metrics including accuracy, specificity, negative predictive value, and F1 score were also proved our ensemble model’s superiority. The contributing risk factors were estimated using six randomly selected subjects, which showed that being female and relatively younger, together with having higher baseline uric acid, body mass index, γ-glutamyl transpeptidase, total protein, triglycerides, creatinine, and fasting blood glucose can increase the risk of HUA. To further validate our model’s applicability in the health checkup population, we used another cohort of 8559 subjects that also showed our ensemble prediction model had favorable performances with an AUC of 0.846.Conclusion: In this study, the stacking ensemble prediction model for HUA was developed, and it outperformed three individual models that compose it (support vector machine, decision tree C5.0, and eXtreme gradient boosting). The contributing risk factors were identified with insightful ideas.https://www.frontiersin.org/articles/10.3389/fphys.2024.1357404/fullhyperuricemiaprediction modelmachine learningstacking ensemblerisk factors
spellingShingle Yongsheng Zhang
Yongsheng Zhang
Yongsheng Zhang
Li Zhang
Haoyue Lv
Haoyue Lv
Haoyue Lv
Guang Zhang
Guang Zhang
Guang Zhang
Ensemble machine learning prediction of hyperuricemia based on a prospective health checkup population
Frontiers in Physiology
hyperuricemia
prediction model
machine learning
stacking ensemble
risk factors
title Ensemble machine learning prediction of hyperuricemia based on a prospective health checkup population
title_full Ensemble machine learning prediction of hyperuricemia based on a prospective health checkup population
title_fullStr Ensemble machine learning prediction of hyperuricemia based on a prospective health checkup population
title_full_unstemmed Ensemble machine learning prediction of hyperuricemia based on a prospective health checkup population
title_short Ensemble machine learning prediction of hyperuricemia based on a prospective health checkup population
title_sort ensemble machine learning prediction of hyperuricemia based on a prospective health checkup population
topic hyperuricemia
prediction model
machine learning
stacking ensemble
risk factors
url https://www.frontiersin.org/articles/10.3389/fphys.2024.1357404/full
work_keys_str_mv AT yongshengzhang ensemblemachinelearningpredictionofhyperuricemiabasedonaprospectivehealthcheckuppopulation
AT yongshengzhang ensemblemachinelearningpredictionofhyperuricemiabasedonaprospectivehealthcheckuppopulation
AT yongshengzhang ensemblemachinelearningpredictionofhyperuricemiabasedonaprospectivehealthcheckuppopulation
AT lizhang ensemblemachinelearningpredictionofhyperuricemiabasedonaprospectivehealthcheckuppopulation
AT haoyuelv ensemblemachinelearningpredictionofhyperuricemiabasedonaprospectivehealthcheckuppopulation
AT haoyuelv ensemblemachinelearningpredictionofhyperuricemiabasedonaprospectivehealthcheckuppopulation
AT haoyuelv ensemblemachinelearningpredictionofhyperuricemiabasedonaprospectivehealthcheckuppopulation
AT guangzhang ensemblemachinelearningpredictionofhyperuricemiabasedonaprospectivehealthcheckuppopulation
AT guangzhang ensemblemachinelearningpredictionofhyperuricemiabasedonaprospectivehealthcheckuppopulation
AT guangzhang ensemblemachinelearningpredictionofhyperuricemiabasedonaprospectivehealthcheckuppopulation