PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach

Abstract Background Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an...

Full description

Bibliographic Details
Main Authors: Malik Yousef, Fatma Ozdemir, Amhar Jaber, Jens Allmer, Burcu Bakir-Gungor
Format: Article
Language:English
Published: BMC 2023-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05187-2
_version_ 1797863361043496960
author Malik Yousef
Fatma Ozdemir
Amhar Jaber
Jens Allmer
Burcu Bakir-Gungor
author_facet Malik Yousef
Fatma Ozdemir
Amhar Jaber
Jens Allmer
Burcu Bakir-Gungor
author_sort Malik Yousef
collection DOAJ
description Abstract Background Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases. Results PriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research. Conclusions PriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine.
first_indexed 2024-04-09T22:35:27Z
format Article
id doaj.art-d6cbb6f5475d4e0abe9c2a187a4856e8
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-09T22:35:27Z
publishDate 2023-02-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-d6cbb6f5475d4e0abe9c2a187a4856e82023-03-22T12:33:34ZengBMCBMC Bioinformatics1471-21052023-02-0124112410.1186/s12859-023-05187-2PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approachMalik Yousef0Fatma Ozdemir1Amhar Jaber2Jens Allmer3Burcu Bakir-Gungor4Department of Information Systems, Zefat Academic CollegeDepartment of Computer Engineering, Faculty of Engineering, Abdullah Gul UniversityDepartment of Computer Engineering, Faculty of Engineering, Abdullah Gul UniversityMedical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied SciencesDepartment of Computer Engineering, Faculty of Engineering, Abdullah Gul UniversityAbstract Background Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases. Results PriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research. Conclusions PriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine.https://doi.org/10.1186/s12859-023-05187-2Feature selectionFeature scoringFeature groupingBiological knowledge integrationKEGG pathwayClassification
spellingShingle Malik Yousef
Fatma Ozdemir
Amhar Jaber
Jens Allmer
Burcu Bakir-Gungor
PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
BMC Bioinformatics
Feature selection
Feature scoring
Feature grouping
Biological knowledge integration
KEGG pathway
Classification
title PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
title_full PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
title_fullStr PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
title_full_unstemmed PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
title_short PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
title_sort pripath identifying dysregulated pathways from differential gene expression via grouping scoring and modeling with an embedded feature selection approach
topic Feature selection
Feature scoring
Feature grouping
Biological knowledge integration
KEGG pathway
Classification
url https://doi.org/10.1186/s12859-023-05187-2
work_keys_str_mv AT malikyousef pripathidentifyingdysregulatedpathwaysfromdifferentialgeneexpressionviagroupingscoringandmodelingwithanembeddedfeatureselectionapproach
AT fatmaozdemir pripathidentifyingdysregulatedpathwaysfromdifferentialgeneexpressionviagroupingscoringandmodelingwithanembeddedfeatureselectionapproach
AT amharjaber pripathidentifyingdysregulatedpathwaysfromdifferentialgeneexpressionviagroupingscoringandmodelingwithanembeddedfeatureselectionapproach
AT jensallmer pripathidentifyingdysregulatedpathwaysfromdifferentialgeneexpressionviagroupingscoringandmodelingwithanembeddedfeatureselectionapproach
AT burcubakirgungor pripathidentifyingdysregulatedpathwaysfromdifferentialgeneexpressionviagroupingscoringandmodelingwithanembeddedfeatureselectionapproach