Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R
Prostate cancer is the commonly diagnosed cancer worldwide, and there were 1,276 thousand new prostate cancer cases and 359 thousand deaths in 2018. Prostate-specific antigen (PSA) blood level is often elevated in men with prostate cancer, so PSA testing can detect prostate tumours when they are sma...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9478855/ |
_version_ | 1828406415597764608 |
---|---|
author | Gongli Li Han Li |
author_facet | Gongli Li Han Li |
author_sort | Gongli Li |
collection | DOAJ |
description | Prostate cancer is the commonly diagnosed cancer worldwide, and there were 1,276 thousand new prostate cancer cases and 359 thousand deaths in 2018. Prostate-specific antigen (PSA) blood level is often elevated in men with prostate cancer, so PSA testing can detect prostate tumours when they are small, low-grade, and localized. The PSA testing is hard to apply on the less developed and poor areas without sufficient medical funds, so the early accurate PSA level prediction by statistical machine learning models is significant to avoid later stages of prostate cancer that spread outside the Prostate. In this literature, we compare three linear model selection and regularization methods (shrinkage, subset selection, dimension reduction) and nine candidate models (OLS regression, Ridge regression, Lasso regression, Elastic net, best subset selection, forward subset selection, backward subset selection, PCR, PLS) based on leave-one-out-cross-validation (LOOCV) prediction error. As the selection criteria leave-one-out cross-validation is sensitive to outliers, Mahalanobis distance is used for outlier detection and deletion before running each model. The shrinkage method (only lasso and elastic net models) and subset selection method (based on adjusted <inline-formula> <tex-math notation="LaTeX">$R^{2}$ </tex-math></inline-formula>, BIC, Cp, and cross-validation prediction error) can select the variables out. The feature selection results show that prostate weight, cancer volume, amount of benign prostatic hyperplasia, and whether seminal vesicle invasion is necessary variables must include predicting PSA. Age and capsular penetration are the least important variables. The variables of Gleason score, a percent of Gleason scores 4 or 5 are essential sometimes. All the diagnostic figures and results are coded by R, open access, and published on IEEE Xplore Code Ocean. |
first_indexed | 2024-12-10T11:11:18Z |
format | Article |
id | doaj.art-32546e0ed31c41fcaa3fa38ce0d845ec |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-10T11:11:18Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-32546e0ed31c41fcaa3fa38ce0d845ec2022-12-22T01:51:24ZengIEEEIEEE Access2169-35362021-01-019975919760210.1109/ACCESS.2021.30959149478855Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using RGongli Li0https://orcid.org/0000-0001-7382-7090Han Li1The Australian National University, Canberra, ACT, AustraliaThe Australian National University, Canberra, ACT, AustraliaProstate cancer is the commonly diagnosed cancer worldwide, and there were 1,276 thousand new prostate cancer cases and 359 thousand deaths in 2018. Prostate-specific antigen (PSA) blood level is often elevated in men with prostate cancer, so PSA testing can detect prostate tumours when they are small, low-grade, and localized. The PSA testing is hard to apply on the less developed and poor areas without sufficient medical funds, so the early accurate PSA level prediction by statistical machine learning models is significant to avoid later stages of prostate cancer that spread outside the Prostate. In this literature, we compare three linear model selection and regularization methods (shrinkage, subset selection, dimension reduction) and nine candidate models (OLS regression, Ridge regression, Lasso regression, Elastic net, best subset selection, forward subset selection, backward subset selection, PCR, PLS) based on leave-one-out-cross-validation (LOOCV) prediction error. As the selection criteria leave-one-out cross-validation is sensitive to outliers, Mahalanobis distance is used for outlier detection and deletion before running each model. The shrinkage method (only lasso and elastic net models) and subset selection method (based on adjusted <inline-formula> <tex-math notation="LaTeX">$R^{2}$ </tex-math></inline-formula>, BIC, Cp, and cross-validation prediction error) can select the variables out. The feature selection results show that prostate weight, cancer volume, amount of benign prostatic hyperplasia, and whether seminal vesicle invasion is necessary variables must include predicting PSA. Age and capsular penetration are the least important variables. The variables of Gleason score, a percent of Gleason scores 4 or 5 are essential sometimes. All the diagnostic figures and results are coded by R, open access, and published on IEEE Xplore Code Ocean.https://ieeexplore.ieee.org/document/9478855/Machine learninglinear model selection and regularizationprostate-specific antigen predictionprostate cancer screeningR programming |
spellingShingle | Gongli Li Han Li Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R IEEE Access Machine learning linear model selection and regularization prostate-specific antigen prediction prostate cancer screening R programming |
title | Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R |
title_full | Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R |
title_fullStr | Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R |
title_full_unstemmed | Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R |
title_short | Linear Model Selection and Regularization for Serum Prostate-Specific Antigen Prediction of Patients With Prostate Cancer Using R |
title_sort | linear model selection and regularization for serum prostate specific antigen prediction of patients with prostate cancer using r |
topic | Machine learning linear model selection and regularization prostate-specific antigen prediction prostate cancer screening R programming |
url | https://ieeexplore.ieee.org/document/9478855/ |
work_keys_str_mv | AT gonglili linearmodelselectionandregularizationforserumprostatespecificantigenpredictionofpatientswithprostatecancerusingr AT hanli linearmodelselectionandregularizationforserumprostatespecificantigenpredictionofpatientswithprostatecancerusingr |