A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis
Breast cancer is a common and potentially life-threatening disease. Early and accurate diagnosis of breast cancer is crucial for effective treatment and improved patient outcomes. This study proposed using the Light Gradient-Boosting Machine (LightGBM) algorithm, Borderline- Synthetic Minority Overs...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-12-01
|
Series: | Healthcare Analytics |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2772442523000850 |
_version_ | 1797785481956556800 |
---|---|
author | Temidayo Oluwatosin Omotehinwa David Opeoluwa Oyewola Emmanuel Gbenga Dada |
author_facet | Temidayo Oluwatosin Omotehinwa David Opeoluwa Oyewola Emmanuel Gbenga Dada |
author_sort | Temidayo Oluwatosin Omotehinwa |
collection | DOAJ |
description | Breast cancer is a common and potentially life-threatening disease. Early and accurate diagnosis of breast cancer is crucial for effective treatment and improved patient outcomes. This study proposed using the Light Gradient-Boosting Machine (LightGBM) algorithm, Borderline- Synthetic Minority Oversampling Technique (SMOTE), and the Tree-Structured Parzen Estimator (TPE) for hyperparameter tuning to enhance the effectiveness of the Machine Learning (ML) model for diagnosing breast cancer. A 10-fold cross-validated TPE optimized Borderline-SMOTE LightGBM classifier was modelled on the Wisconsin Diagnostic Breast Cancer (WDBC) Dataset and evaluated for its performance compared to a baseline LightGBM model. The TPE-optimized Borderline-SMOTE LightGBM model exhibited a significant improvement in performance over the baseline model, achieving an average accuracy of 99.12%, specificity of 100%, precision of 100%, recall of 97.62%, F1-score of 98.80%, and a Mathews Correlation Coefficient of 98.12%. Compared to previous studies, the TPE-optimized Borderline-SMOTE LightGBM model performed exceptionally well. The study demonstrates the effectiveness of using data augmentation and hyperparameter optimization techniques to improve the performance of ML models for breast cancer diagnosis, which has significant implications for the medical field where the accurate and efficient diagnosis of breast cancer is critical. |
first_indexed | 2024-03-13T00:55:40Z |
format | Article |
id | doaj.art-f4292d0b70f748c8b5193e56f6d171de |
institution | Directory Open Access Journal |
issn | 2772-4425 |
language | English |
last_indexed | 2024-03-13T00:55:40Z |
publishDate | 2023-12-01 |
publisher | Elsevier |
record_format | Article |
series | Healthcare Analytics |
spelling | doaj.art-f4292d0b70f748c8b5193e56f6d171de2023-07-07T04:28:06ZengElsevierHealthcare Analytics2772-44252023-12-014100218A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosisTemidayo Oluwatosin Omotehinwa0David Opeoluwa Oyewola1Emmanuel Gbenga Dada2Department of Mathematics and Computer Science, Federal University of Health Sciences, Otukpo P.M.B. 145, Nigeria; Correspondence to: Department of Mathematics and Computer Science, Faculty of Science, Federal University of Health Sciences, Otukpo, P.M.B. 145, Nigeria.Department of Mathematics and Statistics, Federal University Kashere, Gombe P.M.B. 0182, NigeriaDepartment of Mathematical Sciences, University of Maiduguri, Maiduguri P.M.B. 1069, NigeriaBreast cancer is a common and potentially life-threatening disease. Early and accurate diagnosis of breast cancer is crucial for effective treatment and improved patient outcomes. This study proposed using the Light Gradient-Boosting Machine (LightGBM) algorithm, Borderline- Synthetic Minority Oversampling Technique (SMOTE), and the Tree-Structured Parzen Estimator (TPE) for hyperparameter tuning to enhance the effectiveness of the Machine Learning (ML) model for diagnosing breast cancer. A 10-fold cross-validated TPE optimized Borderline-SMOTE LightGBM classifier was modelled on the Wisconsin Diagnostic Breast Cancer (WDBC) Dataset and evaluated for its performance compared to a baseline LightGBM model. The TPE-optimized Borderline-SMOTE LightGBM model exhibited a significant improvement in performance over the baseline model, achieving an average accuracy of 99.12%, specificity of 100%, precision of 100%, recall of 97.62%, F1-score of 98.80%, and a Mathews Correlation Coefficient of 98.12%. Compared to previous studies, the TPE-optimized Borderline-SMOTE LightGBM model performed exceptionally well. The study demonstrates the effectiveness of using data augmentation and hyperparameter optimization techniques to improve the performance of ML models for breast cancer diagnosis, which has significant implications for the medical field where the accurate and efficient diagnosis of breast cancer is critical.http://www.sciencedirect.com/science/article/pii/S2772442523000850Breast cancerMachine learningTree-Structured Parzen EstimatorLight Gradient-Boosting MachineBorderline-SMOTEHyperparameter tuning |
spellingShingle | Temidayo Oluwatosin Omotehinwa David Opeoluwa Oyewola Emmanuel Gbenga Dada A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis Healthcare Analytics Breast cancer Machine learning Tree-Structured Parzen Estimator Light Gradient-Boosting Machine Borderline-SMOTE Hyperparameter tuning |
title | A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis |
title_full | A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis |
title_fullStr | A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis |
title_full_unstemmed | A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis |
title_short | A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis |
title_sort | light gradient boosting machine algorithm with tree structured parzen estimator for breast cancer diagnosis |
topic | Breast cancer Machine learning Tree-Structured Parzen Estimator Light Gradient-Boosting Machine Borderline-SMOTE Hyperparameter tuning |
url | http://www.sciencedirect.com/science/article/pii/S2772442523000850 |
work_keys_str_mv | AT temidayooluwatosinomotehinwa alightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis AT davidopeoluwaoyewola alightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis AT emmanuelgbengadada alightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis AT temidayooluwatosinomotehinwa lightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis AT davidopeoluwaoyewola lightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis AT emmanuelgbengadada lightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis |