Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R

Prostate cancer is the commonly diagnosed cancer worldwide, and there were 1,276 thousand new prostate cancer cases and 359 thousand deaths in 2018. Prostate-specific antigen (PSA) blood level is often elevated in men with prostate cancer, so PSA testing can detect prostate tumours when they are sma...

Full description

Bibliographic Details
Main Authors: Gongli Li, Han Li
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9478855/
_version_ 1828406415597764608
author Gongli Li
Han Li
author_facet Gongli Li
Han Li
author_sort Gongli Li
collection DOAJ
description Prostate cancer is the commonly diagnosed cancer worldwide, and there were 1,276 thousand new prostate cancer cases and 359 thousand deaths in 2018. Prostate-specific antigen (PSA) blood level is often elevated in men with prostate cancer, so PSA testing can detect prostate tumours when they are small, low-grade, and localized. The PSA testing is hard to apply on the less developed and poor areas without sufficient medical funds, so the early accurate PSA level prediction by statistical machine learning models is significant to avoid later stages of prostate cancer that spread outside the Prostate. In this literature, we compare three linear model selection and regularization methods (shrinkage, subset selection, dimension reduction) and nine candidate models (OLS regression, Ridge regression, Lasso regression, Elastic net, best subset selection, forward subset selection, backward subset selection, PCR, PLS) based on leave-one-out-cross-validation (LOOCV) prediction error. As the selection criteria leave-one-out cross-validation is sensitive to outliers, Mahalanobis distance is used for outlier detection and deletion before running each model. The shrinkage method (only lasso and elastic net models) and subset selection method (based on adjusted <inline-formula> <tex-math notation="LaTeX">$R^{2}$ </tex-math></inline-formula>, BIC, Cp, and cross-validation prediction error) can select the variables out. The feature selection results show that prostate weight, cancer volume, amount of benign prostatic hyperplasia, and whether seminal vesicle invasion is necessary variables must include predicting PSA. Age and capsular penetration are the least important variables. The variables of Gleason score, a percent of Gleason scores 4 or 5 are essential sometimes. All the diagnostic figures and results are coded by R, open access, and published on IEEE Xplore Code Ocean.
first_indexed 2024-12-10T11:11:18Z
format Article
id doaj.art-32546e0ed31c41fcaa3fa38ce0d845ec
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-10T11:11:18Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-32546e0ed31c41fcaa3fa38ce0d845ec2022-12-22T01:51:24ZengIEEEIEEE Access2169-35362021-01-019975919760210.1109/ACCESS.2021.30959149478855Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using RGongli Li0https://orcid.org/0000-0001-7382-7090Han Li1The Australian National University, Canberra, ACT, AustraliaThe Australian National University, Canberra, ACT, AustraliaProstate cancer is the commonly diagnosed cancer worldwide, and there were 1,276 thousand new prostate cancer cases and 359 thousand deaths in 2018. Prostate-specific antigen (PSA) blood level is often elevated in men with prostate cancer, so PSA testing can detect prostate tumours when they are small, low-grade, and localized. The PSA testing is hard to apply on the less developed and poor areas without sufficient medical funds, so the early accurate PSA level prediction by statistical machine learning models is significant to avoid later stages of prostate cancer that spread outside the Prostate. In this literature, we compare three linear model selection and regularization methods (shrinkage, subset selection, dimension reduction) and nine candidate models (OLS regression, Ridge regression, Lasso regression, Elastic net, best subset selection, forward subset selection, backward subset selection, PCR, PLS) based on leave-one-out-cross-validation (LOOCV) prediction error. As the selection criteria leave-one-out cross-validation is sensitive to outliers, Mahalanobis distance is used for outlier detection and deletion before running each model. The shrinkage method (only lasso and elastic net models) and subset selection method (based on adjusted <inline-formula> <tex-math notation="LaTeX">$R^{2}$ </tex-math></inline-formula>, BIC, Cp, and cross-validation prediction error) can select the variables out. The feature selection results show that prostate weight, cancer volume, amount of benign prostatic hyperplasia, and whether seminal vesicle invasion is necessary variables must include predicting PSA. Age and capsular penetration are the least important variables. The variables of Gleason score, a percent of Gleason scores 4 or 5 are essential sometimes. All the diagnostic figures and results are coded by R, open access, and published on IEEE Xplore Code Ocean.https://ieeexplore.ieee.org/document/9478855/Machine learninglinear model selection and regularizationprostate-specific antigen predictionprostate cancer screeningR programming
spellingShingle Gongli Li
Han Li
Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R
IEEE Access
Machine learning
linear model selection and regularization
prostate-specific antigen prediction
prostate cancer screening
R programming
title Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R
title_full Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R
title_fullStr Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R
title_full_unstemmed Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R
title_short Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R
title_sort linear model selection and regularization for serum prostate specific antigen prediction of patients with prostate cancer using r
topic Machine learning
linear model selection and regularization
prostate-specific antigen prediction
prostate cancer screening
R programming
url https://ieeexplore.ieee.org/document/9478855/
work_keys_str_mv AT gonglili linearmodelselectionandregularizationforserumprostatespecificantigenpredictionofpatientswithprostatecancerusingr
AT hanli linearmodelselectionandregularizationforserumprostatespecificantigenpredictionofpatientswithprostatecancerusingr