GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning

Introduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selectio...

Full description

Bibliographic Details
Main Authors: Nur Sebnem Ersoz, Burcu Bakir-Gungor, Malik Yousef
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-08-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2023.1139082/full
_version_ 1797739814401867776
author Nur Sebnem Ersoz
Burcu Bakir-Gungor
Burcu Bakir-Gungor
Malik Yousef
Malik Yousef
author_facet Nur Sebnem Ersoz
Burcu Bakir-Gungor
Burcu Bakir-Gungor
Malik Yousef
Malik Yousef
author_sort Nur Sebnem Ersoz
collection DOAJ
description Introduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selection) methods have been proposed. On the other hand, uncovering the core functions of the selected genes provides a deep understanding of diseases. In order to address this problem, biological domain knowledge-based feature selection methods have been proposed. Unlike computational gene selection approaches, these domain knowledge-based methods take the underlying biology into account and integrate knowledge from external biological resources. Gene Ontology (GO) is one such biological resource that provides ontology terms for defining the molecular function, cellular component, and biological process of the gene product.Methods: In this study, we developed a tool named GeNetOntology which performs GO-based feature selection for gene expression data analysis. In the proposed approach, the process of Grouping, Scoring, and Modeling (G-S-M) is used to identify significant GO terms. GO information has been used as the grouping information, which has been embedded into a machine learning (ML) algorithm to select informative ontology terms. The genes annotated with the selected ontology terms have been used in the training part to carry out the classification task of the ML model. The output is an important set of ontologies for the two-class classification task applied to gene expression data for a given phenotype.Results: Our approach has been tested on 11 different gene expression datasets, and the results showed that GeNetOntology successfully identified important disease-related ontology terms to be used in the classification model.Discussion: GeNetOntology will assist geneticists and scientists to identify a range of disease-related genes and ontologies in transcriptomic data analysis, and it will also help doctors design diagnosis platforms and improve patient treatment plans.
first_indexed 2024-03-12T14:03:35Z
format Article
id doaj.art-467206241eb14a6ca427aa874724d596
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-03-12T14:03:35Z
publishDate 2023-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-467206241eb14a6ca427aa874724d5962023-08-21T17:14:27ZengFrontiers Media S.A.Frontiers in Genetics1664-80212023-08-011410.3389/fgene.2023.11390821139082GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learningNur Sebnem Ersoz0Burcu Bakir-Gungor1Burcu Bakir-Gungor2Malik Yousef3Malik Yousef4Department of Bioengineering, Graduate School of Engineering and Science, Abdullah Gul University, Kayseri, TürkiyeDepartment of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, TürkiyeDepartment of Bioengineering, Faculty of Life and Natural Sciences, Abdullah Gul University, Kayseri, TürkiyeDepartment of Information Systems, Zefat Academic College, Zefat, IsraelGalilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, IsraelIntroduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selection) methods have been proposed. On the other hand, uncovering the core functions of the selected genes provides a deep understanding of diseases. In order to address this problem, biological domain knowledge-based feature selection methods have been proposed. Unlike computational gene selection approaches, these domain knowledge-based methods take the underlying biology into account and integrate knowledge from external biological resources. Gene Ontology (GO) is one such biological resource that provides ontology terms for defining the molecular function, cellular component, and biological process of the gene product.Methods: In this study, we developed a tool named GeNetOntology which performs GO-based feature selection for gene expression data analysis. In the proposed approach, the process of Grouping, Scoring, and Modeling (G-S-M) is used to identify significant GO terms. GO information has been used as the grouping information, which has been embedded into a machine learning (ML) algorithm to select informative ontology terms. The genes annotated with the selected ontology terms have been used in the training part to carry out the classification task of the ML model. The output is an important set of ontologies for the two-class classification task applied to gene expression data for a given phenotype.Results: Our approach has been tested on 11 different gene expression datasets, and the results showed that GeNetOntology successfully identified important disease-related ontology terms to be used in the classification model.Discussion: GeNetOntology will assist geneticists and scientists to identify a range of disease-related genes and ontologies in transcriptomic data analysis, and it will also help doctors design diagnosis platforms and improve patient treatment plans.https://www.frontiersin.org/articles/10.3389/fgene.2023.1139082/fullgene ontologygene expression data analysismachine learningfeature selectionenrichment analysisfeature scoring
spellingShingle Nur Sebnem Ersoz
Burcu Bakir-Gungor
Burcu Bakir-Gungor
Malik Yousef
Malik Yousef
GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning
Frontiers in Genetics
gene ontology
gene expression data analysis
machine learning
feature selection
enrichment analysis
feature scoring
title GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning
title_full GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning
title_fullStr GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning
title_full_unstemmed GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning
title_short GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning
title_sort genetontology identifying affected gene ontology terms via grouping scoring and modeling of gene expression data utilizing biological knowledge based machine learning
topic gene ontology
gene expression data analysis
machine learning
feature selection
enrichment analysis
feature scoring
url https://www.frontiersin.org/articles/10.3389/fgene.2023.1139082/full
work_keys_str_mv AT nursebnemersoz genetontologyidentifyingaffectedgeneontologytermsviagroupingscoringandmodelingofgeneexpressiondatautilizingbiologicalknowledgebasedmachinelearning
AT burcubakirgungor genetontologyidentifyingaffectedgeneontologytermsviagroupingscoringandmodelingofgeneexpressiondatautilizingbiologicalknowledgebasedmachinelearning
AT burcubakirgungor genetontologyidentifyingaffectedgeneontologytermsviagroupingscoringandmodelingofgeneexpressiondatautilizingbiologicalknowledgebasedmachinelearning
AT malikyousef genetontologyidentifyingaffectedgeneontologytermsviagroupingscoringandmodelingofgeneexpressiondatautilizingbiologicalknowledgebasedmachinelearning
AT malikyousef genetontologyidentifyingaffectedgeneontologytermsviagroupingscoringandmodelingofgeneexpressiondatautilizingbiologicalknowledgebasedmachinelearning