A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study
With the massive incidence of cancer in recent centuries, it is crucial to carefully analyze the recorded information and provide a thought-out plan for patients’ treatment. A prevalent type of cancer among men, which takes many lives annually, is prostate cancer. The widespread use of machine learn...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-01-01
|
Series: | Informatics in Medicine Unlocked |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352914821002379 |
_version_ | 1819101988847091712 |
---|---|
author | N. Momenzadeh H. Hafezalseheh M.R. Nayebpour M. Fathian R. Noorossana |
author_facet | N. Momenzadeh H. Hafezalseheh M.R. Nayebpour M. Fathian R. Noorossana |
author_sort | N. Momenzadeh |
collection | DOAJ |
description | With the massive incidence of cancer in recent centuries, it is crucial to carefully analyze the recorded information and provide a thought-out plan for patients’ treatment. A prevalent type of cancer among men, which takes many lives annually, is prostate cancer. The widespread use of machine learning methods can be beneficial for alleviating prostate cancer and minimizing the large number of patients who die due to this cancer. In this research, we proposed a hybrid methodology for predicting the survivability of patients suffering from prostate cancer by applying the Factor Analysis of Mixed Data (FAMD) algorithm, along with under-sampling methods for the SEER dataset as the pre-processing step prior to the main models, namely XGBoost, random forest (RF), support vector machine (SVM), and logistic regression (LR) with a cross-validation technique for parameter tuning to predict both binary labeled and multi-class labeled (including other causes of death) cases, which has been rarely investigated in other related studies. The sensitivity analysis was done by cluster centroid as an under-sampling method by which the different proportions of the majority and minority classes were examined for training the binary classification. This strategy showed using different ratios of the binary classes can influence the accuracy of prediction and prevents overfitting. Having evaluated the models by proper criteria, such as G-mean, we realized the XGBoost (86.28%) and SVM (67.81%) models outperformed the others for two and three-class outcomes, respectively. Compared with similar studies, our method successfully separated the patients regarding their mortality status and if they have passed away due to prostate cancer that can be important for clinical decision making or whether medical experts are required to change their treatment strategy. |
first_indexed | 2024-12-22T01:27:25Z |
format | Article |
id | doaj.art-e9d2c4b9f13f497ab86214a0e51c724c |
institution | Directory Open Access Journal |
issn | 2352-9148 |
language | English |
last_indexed | 2024-12-22T01:27:25Z |
publishDate | 2021-01-01 |
publisher | Elsevier |
record_format | Article |
series | Informatics in Medicine Unlocked |
spelling | doaj.art-e9d2c4b9f13f497ab86214a0e51c724c2022-12-21T18:43:35ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0127100763A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population studyN. Momenzadeh0H. Hafezalseheh1M.R. Nayebpour2M. Fathian3R. Noorossana4Industrial Engineering Department, Iran University of Science and Technology, Tehran, IranIndustrial Engineering Department, Iran University of Science and Technology, Tehran, Iran; Faculté de Médecine, Université Laval, Québec, CanadaMarilyn Davies College of Business, University of Houston-Downtown, Texas, USA; Corresponding author.Industrial Engineering Department, Iran University of Science and Technology, Tehran, IranIndustrial Engineering Department, Iran University of Science and Technology, Tehran, IranWith the massive incidence of cancer in recent centuries, it is crucial to carefully analyze the recorded information and provide a thought-out plan for patients’ treatment. A prevalent type of cancer among men, which takes many lives annually, is prostate cancer. The widespread use of machine learning methods can be beneficial for alleviating prostate cancer and minimizing the large number of patients who die due to this cancer. In this research, we proposed a hybrid methodology for predicting the survivability of patients suffering from prostate cancer by applying the Factor Analysis of Mixed Data (FAMD) algorithm, along with under-sampling methods for the SEER dataset as the pre-processing step prior to the main models, namely XGBoost, random forest (RF), support vector machine (SVM), and logistic regression (LR) with a cross-validation technique for parameter tuning to predict both binary labeled and multi-class labeled (including other causes of death) cases, which has been rarely investigated in other related studies. The sensitivity analysis was done by cluster centroid as an under-sampling method by which the different proportions of the majority and minority classes were examined for training the binary classification. This strategy showed using different ratios of the binary classes can influence the accuracy of prediction and prevents overfitting. Having evaluated the models by proper criteria, such as G-mean, we realized the XGBoost (86.28%) and SVM (67.81%) models outperformed the others for two and three-class outcomes, respectively. Compared with similar studies, our method successfully separated the patients regarding their mortality status and if they have passed away due to prostate cancer that can be important for clinical decision making or whether medical experts are required to change their treatment strategy.http://www.sciencedirect.com/science/article/pii/S2352914821002379Prostate cancerMachine learningFAMDCluster centroidsSEER |
spellingShingle | N. Momenzadeh H. Hafezalseheh M.R. Nayebpour M. Fathian R. Noorossana A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study Informatics in Medicine Unlocked Prostate cancer Machine learning FAMD Cluster centroids SEER |
title | A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study |
title_full | A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study |
title_fullStr | A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study |
title_full_unstemmed | A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study |
title_short | A hybrid machine learning approach for predicting survival of patients with prostate cancer: A SEER-based population study |
title_sort | hybrid machine learning approach for predicting survival of patients with prostate cancer a seer based population study |
topic | Prostate cancer Machine learning FAMD Cluster centroids SEER |
url | http://www.sciencedirect.com/science/article/pii/S2352914821002379 |
work_keys_str_mv | AT nmomenzadeh ahybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy AT hhafezalseheh ahybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy AT mrnayebpour ahybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy AT mfathian ahybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy AT rnoorossana ahybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy AT nmomenzadeh hybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy AT hhafezalseheh hybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy AT mrnayebpour hybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy AT mfathian hybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy AT rnoorossana hybridmachinelearningapproachforpredictingsurvivalofpatientswithprostatecanceraseerbasedpopulationstudy |