Machine learning-based prediction model and visual interpretation for prostate cancer

Abstract Background Most prostate cancers(PCa) rely on serum prostate-specific antigen (PSA) testing for biopsy confirmation, but the accuracy needs to be further improved. We need to continue to develop PCa prediction model with high clinical application value. Methods Benign prostatic hyperplasia...

Full description

Bibliographic Details
Main Authors: Gang Chen, Xuchao Dai, Mengqi Zhang, Zhujun Tian, Xueke Jin, Kun Mei, Hong Huang, Zhigang Wu
Format: Article
Language:English
Published: BMC 2023-10-01
Series:BMC Urology
Subjects:
Online Access:https://doi.org/10.1186/s12894-023-01316-4
_version_ 1797556178351292416
author Gang Chen
Xuchao Dai
Mengqi Zhang
Zhujun Tian
Xueke Jin
Kun Mei
Hong Huang
Zhigang Wu
author_facet Gang Chen
Xuchao Dai
Mengqi Zhang
Zhujun Tian
Xueke Jin
Kun Mei
Hong Huang
Zhigang Wu
author_sort Gang Chen
collection DOAJ
description Abstract Background Most prostate cancers(PCa) rely on serum prostate-specific antigen (PSA) testing for biopsy confirmation, but the accuracy needs to be further improved. We need to continue to develop PCa prediction model with high clinical application value. Methods Benign prostatic hyperplasia (BPH) and prostate cancer data were obtained from the Chinese National Clinical Medical Science Data Center for retrospective analysis. The model was constructed using the XGBoost algorithm, and patients’ age, body mass index (BMI), PSA-related parameters and serum biochemical parameters were used as model variables. Using decision analysis curve (DCA) to evaluate the clinical utility of the models. The shapley additive explanation (SHAP) framework was used to analyze the importance ranking and risk threshold of the variables. Results A total of 1915 patients were included in this study, including 823 (43.0%) were BPH patients and 1092 (57.0%) were PCa patients. The XGBoost model provided better performance (AUC 0.82) compared with f/tPSA (AUC 0.75),tPSA (AUC 0.68) and fPSA (AUC 0.61), respectively. Based on SHAP values, f/tPSA was the most important variable, and the top five most important biochemical parameter variables were inorganic phosphorus (P), potassium (K), creatine kinase MB isoenzyme (CKMB), low-density lipoprotein cholesterol (LDL-C), and creatinine (Cre). PCa risk thresholds for these risk markers were f/tPSA (0.13), P (1.29 mmol/L), K (4.29 mmol/L), CKMB ( 11.6U/L), LDL-C (3.05mmol/L) and Cre (74.5-99.1umol/L). Conclusion The present model has advantages of wide-spread availability and high net benefit, especially for underdeveloped countries and regions. Furthermore, these risk thresholds can assist in the diagnosis and screening of prostate cancer in clinical practice.
first_indexed 2024-03-10T16:58:14Z
format Article
id doaj.art-a9313e630b17481d83d7da275f88be36
institution Directory Open Access Journal
issn 1471-2490
language English
last_indexed 2024-03-10T16:58:14Z
publishDate 2023-10-01
publisher BMC
record_format Article
series BMC Urology
spelling doaj.art-a9313e630b17481d83d7da275f88be362023-11-20T11:03:28ZengBMCBMC Urology1471-24902023-10-012311810.1186/s12894-023-01316-4Machine learning-based prediction model and visual interpretation for prostate cancerGang Chen0Xuchao Dai1Mengqi Zhang2Zhujun Tian3Xueke Jin4Kun Mei5Hong Huang6Zhigang Wu7School of Public Health and Management, Wenzhou Medical UniversitySchool of Public Health and Management, Wenzhou Medical UniversitySchool of Public Health and Management, Wenzhou Medical UniversitySchool of Public Health and Management, Wenzhou Medical UniversitySchool of Public Health and Management, Wenzhou Medical UniversitySchool of Environmental Science and Engineering, Suzhou University of Science and TechnologyCenter for Health Assessment, Wenzhou Medical UniversityDepartment of Urology, The First Affiliated Hospital of Wenzhou Medical UniversityAbstract Background Most prostate cancers(PCa) rely on serum prostate-specific antigen (PSA) testing for biopsy confirmation, but the accuracy needs to be further improved. We need to continue to develop PCa prediction model with high clinical application value. Methods Benign prostatic hyperplasia (BPH) and prostate cancer data were obtained from the Chinese National Clinical Medical Science Data Center for retrospective analysis. The model was constructed using the XGBoost algorithm, and patients’ age, body mass index (BMI), PSA-related parameters and serum biochemical parameters were used as model variables. Using decision analysis curve (DCA) to evaluate the clinical utility of the models. The shapley additive explanation (SHAP) framework was used to analyze the importance ranking and risk threshold of the variables. Results A total of 1915 patients were included in this study, including 823 (43.0%) were BPH patients and 1092 (57.0%) were PCa patients. The XGBoost model provided better performance (AUC 0.82) compared with f/tPSA (AUC 0.75),tPSA (AUC 0.68) and fPSA (AUC 0.61), respectively. Based on SHAP values, f/tPSA was the most important variable, and the top five most important biochemical parameter variables were inorganic phosphorus (P), potassium (K), creatine kinase MB isoenzyme (CKMB), low-density lipoprotein cholesterol (LDL-C), and creatinine (Cre). PCa risk thresholds for these risk markers were f/tPSA (0.13), P (1.29 mmol/L), K (4.29 mmol/L), CKMB ( 11.6U/L), LDL-C (3.05mmol/L) and Cre (74.5-99.1umol/L). Conclusion The present model has advantages of wide-spread availability and high net benefit, especially for underdeveloped countries and regions. Furthermore, these risk thresholds can assist in the diagnosis and screening of prostate cancer in clinical practice.https://doi.org/10.1186/s12894-023-01316-4Prostate cancerMachine learningShapley valuesBiochemical parametersRisk threshold
spellingShingle Gang Chen
Xuchao Dai
Mengqi Zhang
Zhujun Tian
Xueke Jin
Kun Mei
Hong Huang
Zhigang Wu
Machine learning-based prediction model and visual interpretation for prostate cancer
BMC Urology
Prostate cancer
Machine learning
Shapley values
Biochemical parameters
Risk threshold
title Machine learning-based prediction model and visual interpretation for prostate cancer
title_full Machine learning-based prediction model and visual interpretation for prostate cancer
title_fullStr Machine learning-based prediction model and visual interpretation for prostate cancer
title_full_unstemmed Machine learning-based prediction model and visual interpretation for prostate cancer
title_short Machine learning-based prediction model and visual interpretation for prostate cancer
title_sort machine learning based prediction model and visual interpretation for prostate cancer
topic Prostate cancer
Machine learning
Shapley values
Biochemical parameters
Risk threshold
url https://doi.org/10.1186/s12894-023-01316-4
work_keys_str_mv AT gangchen machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer
AT xuchaodai machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer
AT mengqizhang machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer
AT zhujuntian machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer
AT xuekejin machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer
AT kunmei machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer
AT honghuang machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer
AT zhigangwu machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer