A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis

Breast cancer is a common and potentially life-threatening disease. Early and accurate diagnosis of breast cancer is crucial for effective treatment and improved patient outcomes. This study proposed using the Light Gradient-Boosting Machine (LightGBM) algorithm, Borderline- Synthetic Minority Overs...

Full description

Bibliographic Details
Main Authors: Temidayo Oluwatosin Omotehinwa, David Opeoluwa Oyewola, Emmanuel Gbenga Dada
Format: Article
Language:English
Published: Elsevier 2023-12-01
Series:Healthcare Analytics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772442523000850
_version_ 1797785481956556800
author Temidayo Oluwatosin Omotehinwa
David Opeoluwa Oyewola
Emmanuel Gbenga Dada
author_facet Temidayo Oluwatosin Omotehinwa
David Opeoluwa Oyewola
Emmanuel Gbenga Dada
author_sort Temidayo Oluwatosin Omotehinwa
collection DOAJ
description Breast cancer is a common and potentially life-threatening disease. Early and accurate diagnosis of breast cancer is crucial for effective treatment and improved patient outcomes. This study proposed using the Light Gradient-Boosting Machine (LightGBM) algorithm, Borderline- Synthetic Minority Oversampling Technique (SMOTE), and the Tree-Structured Parzen Estimator (TPE) for hyperparameter tuning to enhance the effectiveness of the Machine Learning (ML) model for diagnosing breast cancer. A 10-fold cross-validated TPE optimized Borderline-SMOTE LightGBM classifier was modelled on the Wisconsin Diagnostic Breast Cancer (WDBC) Dataset and evaluated for its performance compared to a baseline LightGBM model. The TPE-optimized Borderline-SMOTE LightGBM model exhibited a significant improvement in performance over the baseline model, achieving an average accuracy of 99.12%, specificity of 100%, precision of 100%, recall of 97.62%, F1-score of 98.80%, and a Mathews Correlation Coefficient of 98.12%. Compared to previous studies, the TPE-optimized Borderline-SMOTE LightGBM model performed exceptionally well. The study demonstrates the effectiveness of using data augmentation and hyperparameter optimization techniques to improve the performance of ML models for breast cancer diagnosis, which has significant implications for the medical field where the accurate and efficient diagnosis of breast cancer is critical.
first_indexed 2024-03-13T00:55:40Z
format Article
id doaj.art-f4292d0b70f748c8b5193e56f6d171de
institution Directory Open Access Journal
issn 2772-4425
language English
last_indexed 2024-03-13T00:55:40Z
publishDate 2023-12-01
publisher Elsevier
record_format Article
series Healthcare Analytics
spelling doaj.art-f4292d0b70f748c8b5193e56f6d171de2023-07-07T04:28:06ZengElsevierHealthcare Analytics2772-44252023-12-014100218A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosisTemidayo Oluwatosin Omotehinwa0David Opeoluwa Oyewola1Emmanuel Gbenga Dada2Department of Mathematics and Computer Science, Federal University of Health Sciences, Otukpo P.M.B. 145, Nigeria; Correspondence to: Department of Mathematics and Computer Science, Faculty of Science, Federal University of Health Sciences, Otukpo, P.M.B. 145, Nigeria.Department of Mathematics and Statistics, Federal University Kashere, Gombe P.M.B. 0182, NigeriaDepartment of Mathematical Sciences, University of Maiduguri, Maiduguri P.M.B. 1069, NigeriaBreast cancer is a common and potentially life-threatening disease. Early and accurate diagnosis of breast cancer is crucial for effective treatment and improved patient outcomes. This study proposed using the Light Gradient-Boosting Machine (LightGBM) algorithm, Borderline- Synthetic Minority Oversampling Technique (SMOTE), and the Tree-Structured Parzen Estimator (TPE) for hyperparameter tuning to enhance the effectiveness of the Machine Learning (ML) model for diagnosing breast cancer. A 10-fold cross-validated TPE optimized Borderline-SMOTE LightGBM classifier was modelled on the Wisconsin Diagnostic Breast Cancer (WDBC) Dataset and evaluated for its performance compared to a baseline LightGBM model. The TPE-optimized Borderline-SMOTE LightGBM model exhibited a significant improvement in performance over the baseline model, achieving an average accuracy of 99.12%, specificity of 100%, precision of 100%, recall of 97.62%, F1-score of 98.80%, and a Mathews Correlation Coefficient of 98.12%. Compared to previous studies, the TPE-optimized Borderline-SMOTE LightGBM model performed exceptionally well. The study demonstrates the effectiveness of using data augmentation and hyperparameter optimization techniques to improve the performance of ML models for breast cancer diagnosis, which has significant implications for the medical field where the accurate and efficient diagnosis of breast cancer is critical.http://www.sciencedirect.com/science/article/pii/S2772442523000850Breast cancerMachine learningTree-Structured Parzen EstimatorLight Gradient-Boosting MachineBorderline-SMOTEHyperparameter tuning
spellingShingle Temidayo Oluwatosin Omotehinwa
David Opeoluwa Oyewola
Emmanuel Gbenga Dada
A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis
Healthcare Analytics
Breast cancer
Machine learning
Tree-Structured Parzen Estimator
Light Gradient-Boosting Machine
Borderline-SMOTE
Hyperparameter tuning
title A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis
title_full A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis
title_fullStr A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis
title_full_unstemmed A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis
title_short A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis
title_sort light gradient boosting machine algorithm with tree structured parzen estimator for breast cancer diagnosis
topic Breast cancer
Machine learning
Tree-Structured Parzen Estimator
Light Gradient-Boosting Machine
Borderline-SMOTE
Hyperparameter tuning
url http://www.sciencedirect.com/science/article/pii/S2772442523000850
work_keys_str_mv AT temidayooluwatosinomotehinwa alightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis
AT davidopeoluwaoyewola alightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis
AT emmanuelgbengadada alightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis
AT temidayooluwatosinomotehinwa lightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis
AT davidopeoluwaoyewola lightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis
AT emmanuelgbengadada lightgradientboostingmachinealgorithmwithtreestructuredparzenestimatorforbreastcancerdiagnosis