Machine learning-based integrated identification of predictive combined diagnostic biomarkers for endometriosis

Background: Endometriosis (EM) is a common gynecological condition in women of reproductive age, with diverse causes and a not yet fully understood pathogenesis. Traditional diagnostics rely on single diagnostic biomarkers and does not integrate a variety of different biomarkers. This study introduc...

Full description

Bibliographic Details
Main Authors: Haolong Zhang, Haoling Zhang, Huadi Yang, Ahmad Naqib Shuid, Doblin Sandai, Xingbei Chen
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-11-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2023.1290036/full
_version_ 1797403378438897664
author Haolong Zhang
Haoling Zhang
Huadi Yang
Ahmad Naqib Shuid
Doblin Sandai
Xingbei Chen
author_facet Haolong Zhang
Haoling Zhang
Huadi Yang
Ahmad Naqib Shuid
Doblin Sandai
Xingbei Chen
author_sort Haolong Zhang
collection DOAJ
description Background: Endometriosis (EM) is a common gynecological condition in women of reproductive age, with diverse causes and a not yet fully understood pathogenesis. Traditional diagnostics rely on single diagnostic biomarkers and does not integrate a variety of different biomarkers. This study introduces multiple machine learning techniques, enhancing the accuracy of predictive models. A novel diagnostic approach that combines various biomarkers provides a new clinical perspective for improving the diagnostic efficiency of endometriosis, holding significant potential for clinical application.Methods: In this study, GSE51981 was used as a test set, and 11 machine learning algorithms (Lasso, Stepglm, glmBoost, Support Vector Machine, Ridge, Enet, plsRglm, Random Forest, LDA, XGBoost, and NaiveBayes) were employed to construct 113 predictive models for endometriosis. The optimal model was determined based on the AUC values derived from various algorithms. These genes were then evaluated using nine machine learning algorithms (Random Forest, SVM, Gradient Boosting Machine, LASSO, XGB, NNET, Generalized Linear Model, KNN, and Decision Tree) to assess significance scores and identify diagnostic genes for each algorithm. The diagnostic value of these genes was further validated in external datasets from GSE7305, GSE11691, and GSE120103.Results: Analysis of the GSE51981 dataset revealed 62 DEGs. The Stepglm [Both] and plsRglm algorithms identified 30 genes with the most potential using the AUC evaluation. Subsequently, nine machine learning algorithms were applied to select diagnostic genes, leading to the identification of five key diagnostic genes using the LASSO algorithm. The ADAT1 gene exhibited the best single-gene predictive performance, with an AUC of 0.785. A combination of genes (FOS, EPHX1, DLGAP5, PCSK5, and ADAT1) achieves an AUC of 0.836 in the test dataset. Moreover, these genes consistently exhibited an AUC exceeding 0.78 in all validation datasets, demonstrating superior predictive performance. Furthermore, correlation analysis with immune infiltration strengthened their predictive value by demonstrating the close relationship of the diagnostic genes with immune infiltrating cells.Conclusion: A combination of biomarkers consisting of FOS, EPHX1, DLGAP5, PCSK5, and ADAT1 can serve as a diagnostic tool for endometriosis, enhancing diagnostic efficiency. The association of these genes with immune infiltrating cells reveals their potential role in the pathogenesis of endometriosis, providing new insights for early detection and treatment.
first_indexed 2024-03-09T02:38:17Z
format Article
id doaj.art-4ca0a131507f493a830907714a58e5d4
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-03-09T02:38:17Z
publishDate 2023-11-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-4ca0a131507f493a830907714a58e5d42023-12-06T07:58:56ZengFrontiers Media S.A.Frontiers in Genetics1664-80212023-11-011410.3389/fgene.2023.12900361290036Machine learning-based integrated identification of predictive combined diagnostic biomarkers for endometriosisHaolong Zhang0Haoling Zhang1Huadi Yang2Ahmad Naqib Shuid3Doblin Sandai4Xingbei Chen5Department of Biomedical Sciences, Advanced Medical and Dental Institute, Universiti Sains Malaysia, Penang, MalaysiaDepartment of Biomedical Sciences, Advanced Medical and Dental Institute, Universiti Sains Malaysia, Penang, MalaysiaThe First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, ChinaDepartment of Biomedical Sciences, Advanced Medical and Dental Institute, Universiti Sains Malaysia, Penang, MalaysiaDepartment of Community Health, Advanced Medical and Dental Institute, Universiti Sains Malaysia, Penang, MalaysiaThe First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, ChinaBackground: Endometriosis (EM) is a common gynecological condition in women of reproductive age, with diverse causes and a not yet fully understood pathogenesis. Traditional diagnostics rely on single diagnostic biomarkers and does not integrate a variety of different biomarkers. This study introduces multiple machine learning techniques, enhancing the accuracy of predictive models. A novel diagnostic approach that combines various biomarkers provides a new clinical perspective for improving the diagnostic efficiency of endometriosis, holding significant potential for clinical application.Methods: In this study, GSE51981 was used as a test set, and 11 machine learning algorithms (Lasso, Stepglm, glmBoost, Support Vector Machine, Ridge, Enet, plsRglm, Random Forest, LDA, XGBoost, and NaiveBayes) were employed to construct 113 predictive models for endometriosis. The optimal model was determined based on the AUC values derived from various algorithms. These genes were then evaluated using nine machine learning algorithms (Random Forest, SVM, Gradient Boosting Machine, LASSO, XGB, NNET, Generalized Linear Model, KNN, and Decision Tree) to assess significance scores and identify diagnostic genes for each algorithm. The diagnostic value of these genes was further validated in external datasets from GSE7305, GSE11691, and GSE120103.Results: Analysis of the GSE51981 dataset revealed 62 DEGs. The Stepglm [Both] and plsRglm algorithms identified 30 genes with the most potential using the AUC evaluation. Subsequently, nine machine learning algorithms were applied to select diagnostic genes, leading to the identification of five key diagnostic genes using the LASSO algorithm. The ADAT1 gene exhibited the best single-gene predictive performance, with an AUC of 0.785. A combination of genes (FOS, EPHX1, DLGAP5, PCSK5, and ADAT1) achieves an AUC of 0.836 in the test dataset. Moreover, these genes consistently exhibited an AUC exceeding 0.78 in all validation datasets, demonstrating superior predictive performance. Furthermore, correlation analysis with immune infiltration strengthened their predictive value by demonstrating the close relationship of the diagnostic genes with immune infiltrating cells.Conclusion: A combination of biomarkers consisting of FOS, EPHX1, DLGAP5, PCSK5, and ADAT1 can serve as a diagnostic tool for endometriosis, enhancing diagnostic efficiency. The association of these genes with immune infiltrating cells reveals their potential role in the pathogenesis of endometriosis, providing new insights for early detection and treatment.https://www.frontiersin.org/articles/10.3389/fgene.2023.1290036/fullendometriosiscombined biomarkersmachine learningpredictivediagnostic
spellingShingle Haolong Zhang
Haoling Zhang
Huadi Yang
Ahmad Naqib Shuid
Doblin Sandai
Xingbei Chen
Machine learning-based integrated identification of predictive combined diagnostic biomarkers for endometriosis
Frontiers in Genetics
endometriosis
combined biomarkers
machine learning
predictive
diagnostic
title Machine learning-based integrated identification of predictive combined diagnostic biomarkers for endometriosis
title_full Machine learning-based integrated identification of predictive combined diagnostic biomarkers for endometriosis
title_fullStr Machine learning-based integrated identification of predictive combined diagnostic biomarkers for endometriosis
title_full_unstemmed Machine learning-based integrated identification of predictive combined diagnostic biomarkers for endometriosis
title_short Machine learning-based integrated identification of predictive combined diagnostic biomarkers for endometriosis
title_sort machine learning based integrated identification of predictive combined diagnostic biomarkers for endometriosis
topic endometriosis
combined biomarkers
machine learning
predictive
diagnostic
url https://www.frontiersin.org/articles/10.3389/fgene.2023.1290036/full
work_keys_str_mv AT haolongzhang machinelearningbasedintegratedidentificationofpredictivecombineddiagnosticbiomarkersforendometriosis
AT haolingzhang machinelearningbasedintegratedidentificationofpredictivecombineddiagnosticbiomarkersforendometriosis
AT huadiyang machinelearningbasedintegratedidentificationofpredictivecombineddiagnosticbiomarkersforendometriosis
AT ahmadnaqibshuid machinelearningbasedintegratedidentificationofpredictivecombineddiagnosticbiomarkersforendometriosis
AT doblinsandai machinelearningbasedintegratedidentificationofpredictivecombineddiagnosticbiomarkersforendometriosis
AT xingbeichen machinelearningbasedintegratedidentificationofpredictivecombineddiagnosticbiomarkersforendometriosis