Breast Cancer Prediction Using Fine Needle Aspiration Features and Upsampling with Supervised Machine Learning

Breast cancer is one of the most common invasive cancers in women and it continues to be a worldwide medical problem since the number of cases has significantly increased over the past decade. Breast cancer is the second leading cause of death from cancer in women. The early detection of breast canc...

Full description

Bibliographic Details
Main Authors: Rahman Shafique, Furqan Rustam, Gyu Sang Choi, Isabel de la Torre Díez, Arif Mahmood, Vivian Lipari, Carmen Lili Rodríguez Velasco, Imran Ashraf
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Cancers
Subjects:
Online Access:https://www.mdpi.com/2072-6694/15/3/681
_version_ 1797625016599183360
author Rahman Shafique
Furqan Rustam
Gyu Sang Choi
Isabel de la Torre Díez
Arif Mahmood
Vivian Lipari
Carmen Lili Rodríguez Velasco
Imran Ashraf
author_facet Rahman Shafique
Furqan Rustam
Gyu Sang Choi
Isabel de la Torre Díez
Arif Mahmood
Vivian Lipari
Carmen Lili Rodríguez Velasco
Imran Ashraf
author_sort Rahman Shafique
collection DOAJ
description Breast cancer is one of the most common invasive cancers in women and it continues to be a worldwide medical problem since the number of cases has significantly increased over the past decade. Breast cancer is the second leading cause of death from cancer in women. The early detection of breast cancer can save human life but the traditional approach for detecting breast cancer disease needs various laboratory tests involving medical experts. To reduce human error and speed up breast cancer detection, an automatic system is required that would perform the diagnosis accurately and timely. Despite the research efforts for automated systems for cancer detection, a wide gap exists between the desired and provided accuracy of current approaches. To overcome this issue, this research proposes an approach for breast cancer prediction by selecting the best fine needle aspiration features. To enhance the prediction accuracy, several feature selection techniques are applied to analyze their efficacy, such as principal component analysis, singular vector decomposition, and chi-square (Chi2). Extensive experiments are performed with different features and different set sizes of features to investigate the optimal feature set. Additionally, the influence of imbalanced and balanced data using the SMOTE approach is investigated. Six classifiers including random forest, support vector machine, gradient boosting machine, logistic regression, multilayer perceptron, and K-nearest neighbors (KNN) are tuned to achieve increased classification accuracy. Results indicate that KNN outperforms all other classifiers on the used dataset with 20 features using SVD and with the 15 most important features using a PCA with a 100% accuracy score.
first_indexed 2024-03-11T09:50:52Z
format Article
id doaj.art-3ac96de7b3d94bb8ac5a158efc2ef83a
institution Directory Open Access Journal
issn 2072-6694
language English
last_indexed 2024-03-11T09:50:52Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Cancers
spelling doaj.art-3ac96de7b3d94bb8ac5a158efc2ef83a2023-11-16T16:15:43ZengMDPI AGCancers2072-66942023-01-0115368110.3390/cancers15030681Breast Cancer Prediction Using Fine Needle Aspiration Features and Upsampling with Supervised Machine LearningRahman Shafique0Furqan Rustam1Gyu Sang Choi2Isabel de la Torre Díez3Arif Mahmood4Vivian Lipari5Carmen Lili Rodríguez Velasco6Imran Ashraf7Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Republic of KoreaSchool of Computer Science, University College Dublin, D04 V1W8 Dublin, IrelandDepartment of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Republic of KoreaDepartment of Signal Theory and Communications and Telematic Engineering, University of Valladolid, Paseo de Belén 15, 47011 Valladolid, SpainDepartment of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur 63100, Punjab, PakistanResearch Group on Foods, Nutritional Biochemistry and Health, Universidad Europea del Atlántico, Isabel Torres 21, 39011 Santander, SpainResearch Group on Foods, Nutritional Biochemistry and Health, Universidad Europea del Atlántico, Isabel Torres 21, 39011 Santander, SpainDepartment of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Republic of KoreaBreast cancer is one of the most common invasive cancers in women and it continues to be a worldwide medical problem since the number of cases has significantly increased over the past decade. Breast cancer is the second leading cause of death from cancer in women. The early detection of breast cancer can save human life but the traditional approach for detecting breast cancer disease needs various laboratory tests involving medical experts. To reduce human error and speed up breast cancer detection, an automatic system is required that would perform the diagnosis accurately and timely. Despite the research efforts for automated systems for cancer detection, a wide gap exists between the desired and provided accuracy of current approaches. To overcome this issue, this research proposes an approach for breast cancer prediction by selecting the best fine needle aspiration features. To enhance the prediction accuracy, several feature selection techniques are applied to analyze their efficacy, such as principal component analysis, singular vector decomposition, and chi-square (Chi2). Extensive experiments are performed with different features and different set sizes of features to investigate the optimal feature set. Additionally, the influence of imbalanced and balanced data using the SMOTE approach is investigated. Six classifiers including random forest, support vector machine, gradient boosting machine, logistic regression, multilayer perceptron, and K-nearest neighbors (KNN) are tuned to achieve increased classification accuracy. Results indicate that KNN outperforms all other classifiers on the used dataset with 20 features using SVD and with the 15 most important features using a PCA with a 100% accuracy score.https://www.mdpi.com/2072-6694/15/3/681breast cancer predictionfeature selectionfine-needle aspiration featuresprincipal component analysissingular value decompositiondeep learning
spellingShingle Rahman Shafique
Furqan Rustam
Gyu Sang Choi
Isabel de la Torre Díez
Arif Mahmood
Vivian Lipari
Carmen Lili Rodríguez Velasco
Imran Ashraf
Breast Cancer Prediction Using Fine Needle Aspiration Features and Upsampling with Supervised Machine Learning
Cancers
breast cancer prediction
feature selection
fine-needle aspiration features
principal component analysis
singular value decomposition
deep learning
title Breast Cancer Prediction Using Fine Needle Aspiration Features and Upsampling with Supervised Machine Learning
title_full Breast Cancer Prediction Using Fine Needle Aspiration Features and Upsampling with Supervised Machine Learning
title_fullStr Breast Cancer Prediction Using Fine Needle Aspiration Features and Upsampling with Supervised Machine Learning
title_full_unstemmed Breast Cancer Prediction Using Fine Needle Aspiration Features and Upsampling with Supervised Machine Learning
title_short Breast Cancer Prediction Using Fine Needle Aspiration Features and Upsampling with Supervised Machine Learning
title_sort breast cancer prediction using fine needle aspiration features and upsampling with supervised machine learning
topic breast cancer prediction
feature selection
fine-needle aspiration features
principal component analysis
singular value decomposition
deep learning
url https://www.mdpi.com/2072-6694/15/3/681
work_keys_str_mv AT rahmanshafique breastcancerpredictionusingfineneedleaspirationfeaturesandupsamplingwithsupervisedmachinelearning
AT furqanrustam breastcancerpredictionusingfineneedleaspirationfeaturesandupsamplingwithsupervisedmachinelearning
AT gyusangchoi breastcancerpredictionusingfineneedleaspirationfeaturesandupsamplingwithsupervisedmachinelearning
AT isabeldelatorrediez breastcancerpredictionusingfineneedleaspirationfeaturesandupsamplingwithsupervisedmachinelearning
AT arifmahmood breastcancerpredictionusingfineneedleaspirationfeaturesandupsamplingwithsupervisedmachinelearning
AT vivianlipari breastcancerpredictionusingfineneedleaspirationfeaturesandupsamplingwithsupervisedmachinelearning
AT carmenlilirodriguezvelasco breastcancerpredictionusingfineneedleaspirationfeaturesandupsamplingwithsupervisedmachinelearning
AT imranashraf breastcancerpredictionusingfineneedleaspirationfeaturesandupsamplingwithsupervisedmachinelearning