Prediction of Breast Cancer Survival by Machine Learning Methods: An Application of Multiple Imputation

Background: The low breast cancer survival rates in less developed countries are critical. The machine learning techniques predict cancers survival with high accuracy. Missing data are the most important limitation for using the highest potential of these techniques to predict cancers survival. Mult...

Full description

Bibliographic Details
Main Authors: Hadi Lotfnezhad Afshar, Nasrollah JABBARI, Hamid Reza KHALKHALI, Omid ESNAASHARI
Format: Article
Language:English
Published: Tehran University of Medical Sciences 2021-02-01
Series:Iranian Journal of Public Health
Subjects:
Online Access:https://ijph.tums.ac.ir/index.php/ijph/article/view/16101
_version_ 1818777913705627648
author Hadi Lotfnezhad Afshar
Nasrollah JABBARI
Hamid Reza KHALKHALI
Omid ESNAASHARI
author_facet Hadi Lotfnezhad Afshar
Nasrollah JABBARI
Hamid Reza KHALKHALI
Omid ESNAASHARI
author_sort Hadi Lotfnezhad Afshar
collection DOAJ
description Background: The low breast cancer survival rates in less developed countries are critical. The machine learning techniques predict cancers survival with high accuracy. Missing data are the most important limitation for using the highest potential of these techniques to predict cancers survival. Multiple imputation (MI) was implemented and analyzed in detail to impute the missing data of a breast cancer dataset. Methods: The dataset was from The Omid Treatment and Research Center Urmia, Iran between Jan 2006 and Dec 2012 and had information from 856 women. The algorithms such as C5 and repeated incremental pruning to produce error reduction were applied on the imputed versions of the original dataset and the non-imputed dataset to predict and extract clinical rules, respectively. Results: The findings showed the performance of C5 in all the evaluation criteria including accuracy (84.42%), sensitivity (92.21%), specificity (64%), Kappa statistic (59.06%), and the area under the receiver operator characteristic (ROC) curve (0.84), was improved after imputation. Conclusion: The dataset of the present study met the requirements for using the multiple imputation method. The extracted rules after the application of MI were more comprehensive and contained knowledge that is more clinical. However, the clinical value of the extracted rules after filling in the missing data did not noticeably increase.
first_indexed 2024-12-18T11:36:23Z
format Article
id doaj.art-7aea0784e0734eca8109974c23778cea
institution Directory Open Access Journal
issn 2251-6085
2251-6093
language English
last_indexed 2024-12-18T11:36:23Z
publishDate 2021-02-01
publisher Tehran University of Medical Sciences
record_format Article
series Iranian Journal of Public Health
spelling doaj.art-7aea0784e0734eca8109974c23778cea2022-12-21T21:09:30ZengTehran University of Medical SciencesIranian Journal of Public Health2251-60852251-60932021-02-0150310.18502/ijph.v50i3.5606Prediction of Breast Cancer Survival by Machine Learning Methods: An Application of Multiple ImputationHadi Lotfnezhad Afshar0Nasrollah JABBARI1Hamid Reza KHALKHALI2Omid ESNAASHARI3Department of Health Information Technology, School of Paramedical, Urmia University of Medical Sciences, Urmia, IranDepartment of Medical Physics, Solid Tumor Research Center, School of Paramedical, Urmia University of Medical Sciences, Ur-mia, IranDepartment of Biostatistics and Epidemiology, Patient Safety Research Center, School of Medicine, Urmia University of Medical Sciences, Urmia, IranOmid Treatment and Research Center, Urmia, IranBackground: The low breast cancer survival rates in less developed countries are critical. The machine learning techniques predict cancers survival with high accuracy. Missing data are the most important limitation for using the highest potential of these techniques to predict cancers survival. Multiple imputation (MI) was implemented and analyzed in detail to impute the missing data of a breast cancer dataset. Methods: The dataset was from The Omid Treatment and Research Center Urmia, Iran between Jan 2006 and Dec 2012 and had information from 856 women. The algorithms such as C5 and repeated incremental pruning to produce error reduction were applied on the imputed versions of the original dataset and the non-imputed dataset to predict and extract clinical rules, respectively. Results: The findings showed the performance of C5 in all the evaluation criteria including accuracy (84.42%), sensitivity (92.21%), specificity (64%), Kappa statistic (59.06%), and the area under the receiver operator characteristic (ROC) curve (0.84), was improved after imputation. Conclusion: The dataset of the present study met the requirements for using the multiple imputation method. The extracted rules after the application of MI were more comprehensive and contained knowledge that is more clinical. However, the clinical value of the extracted rules after filling in the missing data did not noticeably increase.https://ijph.tums.ac.ir/index.php/ijph/article/view/16101Breast neoplasmsSurvivalObserver variationImputationMachine learning
spellingShingle Hadi Lotfnezhad Afshar
Nasrollah JABBARI
Hamid Reza KHALKHALI
Omid ESNAASHARI
Prediction of Breast Cancer Survival by Machine Learning Methods: An Application of Multiple Imputation
Iranian Journal of Public Health
Breast neoplasms
Survival
Observer variation
Imputation
Machine learning
title Prediction of Breast Cancer Survival by Machine Learning Methods: An Application of Multiple Imputation
title_full Prediction of Breast Cancer Survival by Machine Learning Methods: An Application of Multiple Imputation
title_fullStr Prediction of Breast Cancer Survival by Machine Learning Methods: An Application of Multiple Imputation
title_full_unstemmed Prediction of Breast Cancer Survival by Machine Learning Methods: An Application of Multiple Imputation
title_short Prediction of Breast Cancer Survival by Machine Learning Methods: An Application of Multiple Imputation
title_sort prediction of breast cancer survival by machine learning methods an application of multiple imputation
topic Breast neoplasms
Survival
Observer variation
Imputation
Machine learning
url https://ijph.tums.ac.ir/index.php/ijph/article/view/16101
work_keys_str_mv AT hadilotfnezhadafshar predictionofbreastcancersurvivalbymachinelearningmethodsanapplicationofmultipleimputation
AT nasrollahjabbari predictionofbreastcancersurvivalbymachinelearningmethodsanapplicationofmultipleimputation
AT hamidrezakhalkhali predictionofbreastcancersurvivalbymachinelearningmethodsanapplicationofmultipleimputation
AT omidesnaashari predictionofbreastcancersurvivalbymachinelearningmethodsanapplicationofmultipleimputation