A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study

With the massive incidence of cancer in recent centuries, it is crucial to carefully analyze the recorded information and provide a thought-out plan for patients’ treatment. A prevalent type of cancer among men, which takes many lives annually, is prostate cancer. The widespread use of machine learn...

Full description

Bibliographic Details
Main Authors: N. Momenzadeh, H. Hafezalseheh, M.R. Nayebpour, M. Fathian, R. Noorossana
Format: Article
Language:English
Published: Elsevier 2021-01-01
Series:Informatics in Medicine Unlocked
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352914821002379
_version_ 1819101988847091712
author N. Momenzadeh
H. Hafezalseheh
M.R. Nayebpour
M. Fathian
R. Noorossana
author_facet N. Momenzadeh
H. Hafezalseheh
M.R. Nayebpour
M. Fathian
R. Noorossana
author_sort N. Momenzadeh
collection DOAJ
description With the massive incidence of cancer in recent centuries, it is crucial to carefully analyze the recorded information and provide a thought-out plan for patients’ treatment. A prevalent type of cancer among men, which takes many lives annually, is prostate cancer. The widespread use of machine learning methods can be beneficial for alleviating prostate cancer and minimizing the large number of patients who die due to this cancer. In this research, we proposed a hybrid methodology for predicting the survivability of patients suffering from prostate cancer by applying the Factor Analysis of Mixed Data (FAMD) algorithm, along with under-sampling methods for the SEER dataset as the pre-processing step prior to the main models, namely XGBoost, random forest (RF), support vector machine (SVM), and logistic regression (LR) with a cross-validation technique for parameter tuning to predict both binary labeled and multi-class labeled (including other causes of death) cases, which has been rarely investigated in other related studies. The sensitivity analysis was done by cluster centroid as an under-sampling method by which the different proportions of the majority and minority classes were examined for training the binary classification. This strategy showed using different ratios of the binary classes can influence the accuracy of prediction and prevents overfitting. Having evaluated the models by proper criteria, such as G-mean, we realized the XGBoost (86.28%) and SVM (67.81%) models outperformed the others for two and three-class outcomes, respectively. Compared with similar studies, our method successfully separated the patients regarding their mortality status and if they have passed away due to prostate cancer that can be important for clinical decision making or whether medical experts are required to change their treatment strategy.
first_indexed 2024-12-22T01:27:25Z
format Article
id doaj.art-e9d2c4b9f13f497ab86214a0e51c724c
institution Directory Open Access Journal
issn 2352-9148
language English
last_indexed 2024-12-22T01:27:25Z
publishDate 2021-01-01
publisher Elsevier
record_format Article
series Informatics in Medicine Unlocked
spelling doaj.art-e9d2c4b9f13f497ab86214a0e51c724c2022-12-21T18:43:35ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0127100763A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population studyN. Momenzadeh0H. Hafezalseheh1M.R. Nayebpour2M. Fathian3R. Noorossana4Industrial Engineering Department, Iran University of Science and Technology, Tehran, IranIndustrial Engineering Department, Iran University of Science and Technology, Tehran, Iran; Faculté de Médecine, Université Laval, Québec, CanadaMarilyn Davies College of Business, University of Houston-Downtown, Texas, USA; Corresponding author.Industrial Engineering Department, Iran University of Science and Technology, Tehran, IranIndustrial Engineering Department, Iran University of Science and Technology, Tehran, IranWith the massive incidence of cancer in recent centuries, it is crucial to carefully analyze the recorded information and provide a thought-out plan for patients’ treatment. A prevalent type of cancer among men, which takes many lives annually, is prostate cancer. The widespread use of machine learning methods can be beneficial for alleviating prostate cancer and minimizing the large number of patients who die due to this cancer. In this research, we proposed a hybrid methodology for predicting the survivability of patients suffering from prostate cancer by applying the Factor Analysis of Mixed Data (FAMD) algorithm, along with under-sampling methods for the SEER dataset as the pre-processing step prior to the main models, namely XGBoost, random forest (RF), support vector machine (SVM), and logistic regression (LR) with a cross-validation technique for parameter tuning to predict both binary labeled and multi-class labeled (including other causes of death) cases, which has been rarely investigated in other related studies. The sensitivity analysis was done by cluster centroid as an under-sampling method by which the different proportions of the majority and minority classes were examined for training the binary classification. This strategy showed using different ratios of the binary classes can influence the accuracy of prediction and prevents overfitting. Having evaluated the models by proper criteria, such as G-mean, we realized the XGBoost (86.28%) and SVM (67.81%) models outperformed the others for two and three-class outcomes, respectively. Compared with similar studies, our method successfully separated the patients regarding their mortality status and if they have passed away due to prostate cancer that can be important for clinical decision making or whether medical experts are required to change their treatment strategy.http://www.sciencedirect.com/science/article/pii/S2352914821002379Prostate cancerMachine learningFAMDCluster centroidsSEER
spellingShingle N. Momenzadeh
H. Hafezalseheh
M.R. Nayebpour
M. Fathian
R. Noorossana
A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study
Informatics in Medicine Unlocked
Prostate cancer
Machine learning
FAMD
Cluster centroids
SEER
title A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study
title_full A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study
title_fullStr A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study
title_full_unstemmed A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study
title_short A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study
title_sort hybrid machine learning approach for predicting survival of patients with prostate cancer a seer based population study
topic Prostate cancer
Machine learning
FAMD
Cluster centroids
SEER
url http://www.sciencedirect.com/science/article/pii/S2352914821002379
work_keys_str_mv AT nmomenzadeh ahybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy
AT hhafezalseheh ahybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy
AT mrnayebpour ahybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy
AT mfathian ahybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy
AT rnoorossana ahybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy
AT nmomenzadeh hybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy
AT hhafezalseheh hybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy
AT mrnayebpour hybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy
AT mfathian hybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy
AT rnoorossana hybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy