Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence

Polycystic ovary syndrome (PCOS) has been classified as a severe health problem common among women globally. Early detection and treatment of PCOS reduce the possibility of long-term complications, such as increasing the chances of developing type 2 diabetes and gestational diabetes. Therefore, effe...

Full description

Bibliographic Details
Main Authors: Hela Elmannai, Nora El-Rashidy, Ibrahim Mashal, Manal Abdullah Alohali, Sara Farag, Shaker El-Sappagh, Hager Saleh
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/13/8/1506
_version_ 1827745333115879424
author Hela Elmannai
Nora El-Rashidy
Ibrahim Mashal
Manal Abdullah Alohali
Sara Farag
Shaker El-Sappagh
Hager Saleh
author_facet Hela Elmannai
Nora El-Rashidy
Ibrahim Mashal
Manal Abdullah Alohali
Sara Farag
Shaker El-Sappagh
Hager Saleh
author_sort Hela Elmannai
collection DOAJ
description Polycystic ovary syndrome (PCOS) has been classified as a severe health problem common among women globally. Early detection and treatment of PCOS reduce the possibility of long-term complications, such as increasing the chances of developing type 2 diabetes and gestational diabetes. Therefore, effective and early PCOS diagnosis will help the healthcare systems to reduce the disease’s problems and complications. Machine learning (ML) and ensemble learning have recently shown promising results in medical diagnostics. The main goal of our research is to provide model explanations to ensure efficiency, effectiveness, and trust in the developed model through local and global explanations. Feature selection methods with different types of ML models (logistic regression (LR), random forest (RF), decision tree (DT), naive Bayes (NB), support vector machine (SVM), k-nearest neighbor (KNN), xgboost, and Adaboost algorithm to get optimal feature selection and best model. Stacking ML models that combine the best base ML models with meta-learner are proposed to improve performance. Bayesian optimization is used to optimize ML models. Combining SMOTE (Synthetic Minority Oversampling Techniques) and ENN (Edited Nearest Neighbour) solves the class imbalance. The experimental results were made using a benchmark PCOS dataset with two ratios splitting 70:30 and 80:20. The result showed that the Stacking ML with REF feature selection recorded the highest accuracy at 100 compared to other models.
first_indexed 2024-03-11T05:06:36Z
format Article
id doaj.art-f14e3738d3ff4b0d9d8e3ff1f580504d
institution Directory Open Access Journal
issn 2075-4418
language English
last_indexed 2024-03-11T05:06:36Z
publishDate 2023-04-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj.art-f14e3738d3ff4b0d9d8e3ff1f580504d2023-11-17T18:56:01ZengMDPI AGDiagnostics2075-44182023-04-01138150610.3390/diagnostics13081506Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial IntelligenceHela Elmannai0Nora El-Rashidy1Ibrahim Mashal2Manal Abdullah Alohali3Sara Farag4Shaker El-Sappagh5Hager Saleh6Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi ArabiaMachine Learning and Information Retrieval Department, Faculty of Artificial Intelligence, Kafrelsheiksh University, Kafrelsheiksh 13518, EgyptFaculty of Information Technology, Applied Science Private University, Amman 11937, JordanDepartment of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi ArabiaFaculty of Computers and Informations, South Valley University, Qena 83523, EgyptFaculty of Computer Science and Engineering, Galala University, Suez 435611, EgyptFaculty of Computers and Artificial Intelligence, South Valley University, Hurghada 84511, EgyptPolycystic ovary syndrome (PCOS) has been classified as a severe health problem common among women globally. Early detection and treatment of PCOS reduce the possibility of long-term complications, such as increasing the chances of developing type 2 diabetes and gestational diabetes. Therefore, effective and early PCOS diagnosis will help the healthcare systems to reduce the disease’s problems and complications. Machine learning (ML) and ensemble learning have recently shown promising results in medical diagnostics. The main goal of our research is to provide model explanations to ensure efficiency, effectiveness, and trust in the developed model through local and global explanations. Feature selection methods with different types of ML models (logistic regression (LR), random forest (RF), decision tree (DT), naive Bayes (NB), support vector machine (SVM), k-nearest neighbor (KNN), xgboost, and Adaboost algorithm to get optimal feature selection and best model. Stacking ML models that combine the best base ML models with meta-learner are proposed to improve performance. Bayesian optimization is used to optimize ML models. Combining SMOTE (Synthetic Minority Oversampling Techniques) and ENN (Edited Nearest Neighbour) solves the class imbalance. The experimental results were made using a benchmark PCOS dataset with two ratios splitting 70:30 and 80:20. The result showed that the Stacking ML with REF feature selection recorded the highest accuracy at 100 compared to other models.https://www.mdpi.com/2075-4418/13/8/1506polycystic ovary syndromemachine learningexplainable machine learningensemble learning
spellingShingle Hela Elmannai
Nora El-Rashidy
Ibrahim Mashal
Manal Abdullah Alohali
Sara Farag
Shaker El-Sappagh
Hager Saleh
Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence
Diagnostics
polycystic ovary syndrome
machine learning
explainable machine learning
ensemble learning
title Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence
title_full Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence
title_fullStr Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence
title_full_unstemmed Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence
title_short Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence
title_sort polycystic ovary syndrome detection machine learning model based on optimized feature selection and explainable artificial intelligence
topic polycystic ovary syndrome
machine learning
explainable machine learning
ensemble learning
url https://www.mdpi.com/2075-4418/13/8/1506
work_keys_str_mv AT helaelmannai polycysticovarysyndromedetectionmachinelearningmodelbasedonoptimizedfeatureselectionandexplainableartificialintelligence
AT noraelrashidy polycysticovarysyndromedetectionmachinelearningmodelbasedonoptimizedfeatureselectionandexplainableartificialintelligence
AT ibrahimmashal polycysticovarysyndromedetectionmachinelearningmodelbasedonoptimizedfeatureselectionandexplainableartificialintelligence
AT manalabdullahalohali polycysticovarysyndromedetectionmachinelearningmodelbasedonoptimizedfeatureselectionandexplainableartificialintelligence
AT sarafarag polycysticovarysyndromedetectionmachinelearningmodelbasedonoptimizedfeatureselectionandexplainableartificialintelligence
AT shakerelsappagh polycysticovarysyndromedetectionmachinelearningmodelbasedonoptimizedfeatureselectionandexplainableartificialintelligence
AT hagersaleh polycysticovarysyndromedetectionmachinelearningmodelbasedonoptimizedfeatureselectionandexplainableartificialintelligence