Predicting High-Risk Prostate Cancer Using Machine Learning Methods

Prostate cancer can be low- or high-risk to the patient’s health. Current screening on the basis of prostate-specific antigen (PSA) levels has a tendency towards both false positives and false negatives, both of which have negative consequences. We obtained a dataset of 35,875 patients fro...

Full description

Bibliographic Details
Main Authors: Henry Barlow, Shunqi Mao, Matloob Khushi
Format: Article
Language:English
Published: MDPI AG 2019-09-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/4/3/129
_version_ 1798040600661983232
author Henry Barlow
Shunqi Mao
Matloob Khushi
author_facet Henry Barlow
Shunqi Mao
Matloob Khushi
author_sort Henry Barlow
collection DOAJ
description Prostate cancer can be low- or high-risk to the patient’s health. Current screening on the basis of prostate-specific antigen (PSA) levels has a tendency towards both false positives and false negatives, both of which have negative consequences. We obtained a dataset of 35,875 patients from the screening arm of the National Cancer Institute’s Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. We segmented the data into instances without prostate cancer, instances with low-risk prostate cancer, and instances with high-risk prostate cancer. We developed a pipeline to deal with imbalanced data and proposed algorithms to perform preprocessing on such datasets. We evaluated the accuracy of various machine learning algorithms in predicting high-risk prostate cancer. An accuracy of 91.5% can be achieved by the proposed pipeline, using standard scaling, SVMSMOTE sampling method, and AdaBoost for machine learning. We then evaluated the contribution of rate of change of PSA, age, BMI, and filtration by race to this model’s accuracy. We identified that including the rate of change of PSA and age in our model increased the area under the curve (AUC) of the model by 6.8%, whereas BMI and race had a minimal effect.
first_indexed 2024-04-11T22:09:54Z
format Article
id doaj.art-4f35a9fba9db4d4381affbd94d65552d
institution Directory Open Access Journal
issn 2306-5729
language English
last_indexed 2024-04-11T22:09:54Z
publishDate 2019-09-01
publisher MDPI AG
record_format Article
series Data
spelling doaj.art-4f35a9fba9db4d4381affbd94d65552d2022-12-22T04:00:36ZengMDPI AGData2306-57292019-09-014312910.3390/data4030129data4030129Predicting High-Risk Prostate Cancer Using Machine Learning MethodsHenry Barlow0Shunqi Mao1Matloob Khushi2School of Computer Science, University of Sydney, 2006 Sydney, AustraliaSchool of Computer Science, University of Sydney, 2006 Sydney, AustraliaSchool of Computer Science, University of Sydney, 2006 Sydney, AustraliaProstate cancer can be low- or high-risk to the patient’s health. Current screening on the basis of prostate-specific antigen (PSA) levels has a tendency towards both false positives and false negatives, both of which have negative consequences. We obtained a dataset of 35,875 patients from the screening arm of the National Cancer Institute’s Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. We segmented the data into instances without prostate cancer, instances with low-risk prostate cancer, and instances with high-risk prostate cancer. We developed a pipeline to deal with imbalanced data and proposed algorithms to perform preprocessing on such datasets. We evaluated the accuracy of various machine learning algorithms in predicting high-risk prostate cancer. An accuracy of 91.5% can be achieved by the proposed pipeline, using standard scaling, SVMSMOTE sampling method, and AdaBoost for machine learning. We then evaluated the contribution of rate of change of PSA, age, BMI, and filtration by race to this model’s accuracy. We identified that including the rate of change of PSA and age in our model increased the area under the curve (AUC) of the model by 6.8%, whereas BMI and race had a minimal effect.https://www.mdpi.com/2306-5729/4/3/129prostate cancer screeningPSA rate of changemachine learningimbalanced dataset
spellingShingle Henry Barlow
Shunqi Mao
Matloob Khushi
Predicting High-Risk Prostate Cancer Using Machine Learning Methods
Data
prostate cancer screening
PSA rate of change
machine learning
imbalanced dataset
title Predicting High-Risk Prostate Cancer Using Machine Learning Methods
title_full Predicting High-Risk Prostate Cancer Using Machine Learning Methods
title_fullStr Predicting High-Risk Prostate Cancer Using Machine Learning Methods
title_full_unstemmed Predicting High-Risk Prostate Cancer Using Machine Learning Methods
title_short Predicting High-Risk Prostate Cancer Using Machine Learning Methods
title_sort predicting high risk prostate cancer using machine learning methods
topic prostate cancer screening
PSA rate of change
machine learning
imbalanced dataset
url https://www.mdpi.com/2306-5729/4/3/129
work_keys_str_mv AT henrybarlow predictinghighriskprostatecancerusingmachinelearningmethods
AT shunqimao predictinghighriskprostatecancerusingmachinelearningmethods
AT matloobkhushi predictinghighriskprostatecancerusingmachinelearningmethods