CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis

Most of the traditional gene selection approaches are borrowed from other fields such as statistics and computer science, However, they do not prioritize biologically relevant genes since the ultimate goal is to determine features that optimize model performance metrics not to build a biologically m...

Full description

Bibliographic Details
Main Authors: Malik Yousef, Ege Ülgen, Osman Uğur Sezerman
Format: Article
Language:English
Published: PeerJ Inc. 2021-02-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-336.pdf
_version_ 1818925531247149056
author Malik Yousef
Ege Ülgen
Osman Uğur Sezerman
author_facet Malik Yousef
Ege Ülgen
Osman Uğur Sezerman
author_sort Malik Yousef
collection DOAJ
description Most of the traditional gene selection approaches are borrowed from other fields such as statistics and computer science, However, they do not prioritize biologically relevant genes since the ultimate goal is to determine features that optimize model performance metrics not to build a biologically meaningful model. Therefore, there is an imminent need for new computational tools that integrate the biological knowledge about the data in the process of gene selection and machine learning. Integrative gene selection enables incorporation of biological domain knowledge from external biological resources. In this study, we propose a new computational approach named CogNet that is an integrative gene selection tool that exploits biological knowledge for grouping the genes for the computational modeling tasks of ranking and classification. In CogNet, the pathfindR serves as the biological grouping tool to allow the main algorithm to rank active-subnetwork-oriented KEGG pathway enrichment analysis results to build a biologically relevant model. CogNet provides a list of significant KEGG pathways that can classify the data with a very high accuracy. The list also provides the genes belonging to these pathways that are differentially expressed that are used as features in the classification problem. The list facilitates deep analysis and better interpretability of the role of KEGG pathways in classification of the data thus better establishing the biological relevance of these differentially expressed genes. Even though the main aim of our study is not to improve the accuracy of any existing tool, the performance of the CogNet outperforms a similar approach called maTE while obtaining similar performance compared to other similar tools including SVM-RCE. CogNet was tested on 13 gene expression datasets concerning a variety of diseases.
first_indexed 2024-12-20T02:42:42Z
format Article
id doaj.art-6a9539073a464c23bb484409ff940634
institution Directory Open Access Journal
issn 2376-5992
language English
last_indexed 2024-12-20T02:42:42Z
publishDate 2021-02-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj.art-6a9539073a464c23bb484409ff9406342022-12-21T19:56:17ZengPeerJ Inc.PeerJ Computer Science2376-59922021-02-017e33610.7717/peerj-cs.336CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysisMalik Yousef0Ege Ülgen1Osman Uğur Sezerman2Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, IsraelDepartment of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, TurkeyDepartment of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, TurkeyMost of the traditional gene selection approaches are borrowed from other fields such as statistics and computer science, However, they do not prioritize biologically relevant genes since the ultimate goal is to determine features that optimize model performance metrics not to build a biologically meaningful model. Therefore, there is an imminent need for new computational tools that integrate the biological knowledge about the data in the process of gene selection and machine learning. Integrative gene selection enables incorporation of biological domain knowledge from external biological resources. In this study, we propose a new computational approach named CogNet that is an integrative gene selection tool that exploits biological knowledge for grouping the genes for the computational modeling tasks of ranking and classification. In CogNet, the pathfindR serves as the biological grouping tool to allow the main algorithm to rank active-subnetwork-oriented KEGG pathway enrichment analysis results to build a biologically relevant model. CogNet provides a list of significant KEGG pathways that can classify the data with a very high accuracy. The list also provides the genes belonging to these pathways that are differentially expressed that are used as features in the classification problem. The list facilitates deep analysis and better interpretability of the role of KEGG pathways in classification of the data thus better establishing the biological relevance of these differentially expressed genes. Even though the main aim of our study is not to improve the accuracy of any existing tool, the performance of the CogNet outperforms a similar approach called maTE while obtaining similar performance compared to other similar tools including SVM-RCE. CogNet was tested on 13 gene expression datasets concerning a variety of diseases.https://peerj.com/articles/cs-336.pdfClassificationGene expressionEnrichment analysisKEGG pathwayRankMachine learning
spellingShingle Malik Yousef
Ege Ülgen
Osman Uğur Sezerman
CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
PeerJ Computer Science
Classification
Gene expression
Enrichment analysis
KEGG pathway
Rank
Machine learning
title CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
title_full CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
title_fullStr CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
title_full_unstemmed CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
title_short CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
title_sort cognet classification of gene expression data based on ranked active subnetwork oriented kegg pathway enrichment analysis
topic Classification
Gene expression
Enrichment analysis
KEGG pathway
Rank
Machine learning
url https://peerj.com/articles/cs-336.pdf
work_keys_str_mv AT malikyousef cognetclassificationofgeneexpressiondatabasedonrankedactivesubnetworkorientedkeggpathwayenrichmentanalysis
AT egeulgen cognetclassificationofgeneexpressiondatabasedonrankedactivesubnetworkorientedkeggpathwayenrichmentanalysis
AT osmanugursezerman cognetclassificationofgeneexpressiondatabasedonrankedactivesubnetworkorientedkeggpathwayenrichmentanalysis