Machine learning-based prediction model and visual interpretation for prostate cancer
Abstract Background Most prostate cancers(PCa) rely on serum prostate-specific antigen (PSA) testing for biopsy confirmation, but the accuracy needs to be further improved. We need to continue to develop PCa prediction model with high clinical application value. Methods Benign prostatic hyperplasia...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-10-01
|
Series: | BMC Urology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12894-023-01316-4 |
_version_ | 1797556178351292416 |
---|---|
author | Gang Chen Xuchao Dai Mengqi Zhang Zhujun Tian Xueke Jin Kun Mei Hong Huang Zhigang Wu |
author_facet | Gang Chen Xuchao Dai Mengqi Zhang Zhujun Tian Xueke Jin Kun Mei Hong Huang Zhigang Wu |
author_sort | Gang Chen |
collection | DOAJ |
description | Abstract Background Most prostate cancers(PCa) rely on serum prostate-specific antigen (PSA) testing for biopsy confirmation, but the accuracy needs to be further improved. We need to continue to develop PCa prediction model with high clinical application value. Methods Benign prostatic hyperplasia (BPH) and prostate cancer data were obtained from the Chinese National Clinical Medical Science Data Center for retrospective analysis. The model was constructed using the XGBoost algorithm, and patients’ age, body mass index (BMI), PSA-related parameters and serum biochemical parameters were used as model variables. Using decision analysis curve (DCA) to evaluate the clinical utility of the models. The shapley additive explanation (SHAP) framework was used to analyze the importance ranking and risk threshold of the variables. Results A total of 1915 patients were included in this study, including 823 (43.0%) were BPH patients and 1092 (57.0%) were PCa patients. The XGBoost model provided better performance (AUC 0.82) compared with f/tPSA (AUC 0.75),tPSA (AUC 0.68) and fPSA (AUC 0.61), respectively. Based on SHAP values, f/tPSA was the most important variable, and the top five most important biochemical parameter variables were inorganic phosphorus (P), potassium (K), creatine kinase MB isoenzyme (CKMB), low-density lipoprotein cholesterol (LDL-C), and creatinine (Cre). PCa risk thresholds for these risk markers were f/tPSA (0.13), P (1.29 mmol/L), K (4.29 mmol/L), CKMB ( 11.6U/L), LDL-C (3.05mmol/L) and Cre (74.5-99.1umol/L). Conclusion The present model has advantages of wide-spread availability and high net benefit, especially for underdeveloped countries and regions. Furthermore, these risk thresholds can assist in the diagnosis and screening of prostate cancer in clinical practice. |
first_indexed | 2024-03-10T16:58:14Z |
format | Article |
id | doaj.art-a9313e630b17481d83d7da275f88be36 |
institution | Directory Open Access Journal |
issn | 1471-2490 |
language | English |
last_indexed | 2024-03-10T16:58:14Z |
publishDate | 2023-10-01 |
publisher | BMC |
record_format | Article |
series | BMC Urology |
spelling | doaj.art-a9313e630b17481d83d7da275f88be362023-11-20T11:03:28ZengBMCBMC Urology1471-24902023-10-012311810.1186/s12894-023-01316-4Machine learning-based prediction model and visual interpretation for prostate cancerGang Chen0Xuchao Dai1Mengqi Zhang2Zhujun Tian3Xueke Jin4Kun Mei5Hong Huang6Zhigang Wu7School of Public Health and Management, Wenzhou Medical UniversitySchool of Public Health and Management, Wenzhou Medical UniversitySchool of Public Health and Management, Wenzhou Medical UniversitySchool of Public Health and Management, Wenzhou Medical UniversitySchool of Public Health and Management, Wenzhou Medical UniversitySchool of Environmental Science and Engineering, Suzhou University of Science and TechnologyCenter for Health Assessment, Wenzhou Medical UniversityDepartment of Urology, The First Affiliated Hospital of Wenzhou Medical UniversityAbstract Background Most prostate cancers(PCa) rely on serum prostate-specific antigen (PSA) testing for biopsy confirmation, but the accuracy needs to be further improved. We need to continue to develop PCa prediction model with high clinical application value. Methods Benign prostatic hyperplasia (BPH) and prostate cancer data were obtained from the Chinese National Clinical Medical Science Data Center for retrospective analysis. The model was constructed using the XGBoost algorithm, and patients’ age, body mass index (BMI), PSA-related parameters and serum biochemical parameters were used as model variables. Using decision analysis curve (DCA) to evaluate the clinical utility of the models. The shapley additive explanation (SHAP) framework was used to analyze the importance ranking and risk threshold of the variables. Results A total of 1915 patients were included in this study, including 823 (43.0%) were BPH patients and 1092 (57.0%) were PCa patients. The XGBoost model provided better performance (AUC 0.82) compared with f/tPSA (AUC 0.75),tPSA (AUC 0.68) and fPSA (AUC 0.61), respectively. Based on SHAP values, f/tPSA was the most important variable, and the top five most important biochemical parameter variables were inorganic phosphorus (P), potassium (K), creatine kinase MB isoenzyme (CKMB), low-density lipoprotein cholesterol (LDL-C), and creatinine (Cre). PCa risk thresholds for these risk markers were f/tPSA (0.13), P (1.29 mmol/L), K (4.29 mmol/L), CKMB ( 11.6U/L), LDL-C (3.05mmol/L) and Cre (74.5-99.1umol/L). Conclusion The present model has advantages of wide-spread availability and high net benefit, especially for underdeveloped countries and regions. Furthermore, these risk thresholds can assist in the diagnosis and screening of prostate cancer in clinical practice.https://doi.org/10.1186/s12894-023-01316-4Prostate cancerMachine learningShapley valuesBiochemical parametersRisk threshold |
spellingShingle | Gang Chen Xuchao Dai Mengqi Zhang Zhujun Tian Xueke Jin Kun Mei Hong Huang Zhigang Wu Machine learning-based prediction model and visual interpretation for prostate cancer BMC Urology Prostate cancer Machine learning Shapley values Biochemical parameters Risk threshold |
title | Machine learning-based prediction model and visual interpretation for prostate cancer |
title_full | Machine learning-based prediction model and visual interpretation for prostate cancer |
title_fullStr | Machine learning-based prediction model and visual interpretation for prostate cancer |
title_full_unstemmed | Machine learning-based prediction model and visual interpretation for prostate cancer |
title_short | Machine learning-based prediction model and visual interpretation for prostate cancer |
title_sort | machine learning based prediction model and visual interpretation for prostate cancer |
topic | Prostate cancer Machine learning Shapley values Biochemical parameters Risk threshold |
url | https://doi.org/10.1186/s12894-023-01316-4 |
work_keys_str_mv | AT gangchen machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer AT xuchaodai machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer AT mengqizhang machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer AT zhujuntian machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer AT xuekejin machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer AT kunmei machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer AT honghuang machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer AT zhigangwu machinelearningbasedpredictionmodelandvisualinterpretationforprostatecancer |