Machine learning based refined differential gene expression analysis of pediatric sepsis

Abstract Background Differential expression (DE) analysis of transcriptomic data enables genome-wide analysis of gene expression changes associated with biological conditions of interest. Such analysis often provides a wide list of genes that are differentially expressed between two or more groups....

Full description

Bibliographic Details
Main Authors: Mostafa Abbas, Yasser EL-Manzalawy
Format: Article
Language:English
Published: BMC 2020-08-01
Series:BMC Medical Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12920-020-00771-4
_version_ 1818656758634119168
author Mostafa Abbas
Yasser EL-Manzalawy
author_facet Mostafa Abbas
Yasser EL-Manzalawy
author_sort Mostafa Abbas
collection DOAJ
description Abstract Background Differential expression (DE) analysis of transcriptomic data enables genome-wide analysis of gene expression changes associated with biological conditions of interest. Such analysis often provides a wide list of genes that are differentially expressed between two or more groups. In general, identified differentially expressed genes (DEGs) can be subject to further downstream analysis for obtaining more biological insights such as determining enriched functional pathways or gene ontologies. Furthermore, DEGs are treated as candidate biomarkers and a small set of DEGs might be identified as biomarkers using either biological knowledge or data-driven approaches. Methods In this work, we present a novel approach for identifying biomarkers from a list of DEGs by re-ranking them according to the Minimum Redundancy Maximum Relevance (MRMR) criteria using repeated cross-validation feature selection procedure. Results Using gene expression profiles for 199 children with sepsis and septic shock, we identify 108 DEGs and propose a 10-gene signature for reliably predicting pediatric sepsis mortality with an estimated Area Under ROC Curve (AUC) score of 0.89. Conclusions Machine learning based refinement of DE analysis is a promising tool for prioritizing DEGs and discovering biomarkers from gene expression profiles. Moreover, our reported 10-gene signature for pediatric sepsis mortality may facilitate the development of reliable diagnosis and prognosis biomarkers for sepsis.
first_indexed 2024-12-17T03:30:41Z
format Article
id doaj.art-d33d8aefa4b0405c8072a2fde5554fc7
institution Directory Open Access Journal
issn 1755-8794
language English
last_indexed 2024-12-17T03:30:41Z
publishDate 2020-08-01
publisher BMC
record_format Article
series BMC Medical Genomics
spelling doaj.art-d33d8aefa4b0405c8072a2fde5554fc72022-12-21T22:05:16ZengBMCBMC Medical Genomics1755-87942020-08-0113111010.1186/s12920-020-00771-4Machine learning based refined differential gene expression analysis of pediatric sepsisMostafa Abbas0Yasser EL-Manzalawy1Department of Imaging Science and Innovation, Geisinger Health SystemDepartment of Imaging Science and Innovation, Geisinger Health SystemAbstract Background Differential expression (DE) analysis of transcriptomic data enables genome-wide analysis of gene expression changes associated with biological conditions of interest. Such analysis often provides a wide list of genes that are differentially expressed between two or more groups. In general, identified differentially expressed genes (DEGs) can be subject to further downstream analysis for obtaining more biological insights such as determining enriched functional pathways or gene ontologies. Furthermore, DEGs are treated as candidate biomarkers and a small set of DEGs might be identified as biomarkers using either biological knowledge or data-driven approaches. Methods In this work, we present a novel approach for identifying biomarkers from a list of DEGs by re-ranking them according to the Minimum Redundancy Maximum Relevance (MRMR) criteria using repeated cross-validation feature selection procedure. Results Using gene expression profiles for 199 children with sepsis and septic shock, we identify 108 DEGs and propose a 10-gene signature for reliably predicting pediatric sepsis mortality with an estimated Area Under ROC Curve (AUC) score of 0.89. Conclusions Machine learning based refinement of DE analysis is a promising tool for prioritizing DEGs and discovering biomarkers from gene expression profiles. Moreover, our reported 10-gene signature for pediatric sepsis mortality may facilitate the development of reliable diagnosis and prognosis biomarkers for sepsis.http://link.springer.com/article/10.1186/s12920-020-00771-4Biomarkers discoveryDifferential expression analysisRefined differential gene expression analysisFeature selection
spellingShingle Mostafa Abbas
Yasser EL-Manzalawy
Machine learning based refined differential gene expression analysis of pediatric sepsis
BMC Medical Genomics
Biomarkers discovery
Differential expression analysis
Refined differential gene expression analysis
Feature selection
title Machine learning based refined differential gene expression analysis of pediatric sepsis
title_full Machine learning based refined differential gene expression analysis of pediatric sepsis
title_fullStr Machine learning based refined differential gene expression analysis of pediatric sepsis
title_full_unstemmed Machine learning based refined differential gene expression analysis of pediatric sepsis
title_short Machine learning based refined differential gene expression analysis of pediatric sepsis
title_sort machine learning based refined differential gene expression analysis of pediatric sepsis
topic Biomarkers discovery
Differential expression analysis
Refined differential gene expression analysis
Feature selection
url http://link.springer.com/article/10.1186/s12920-020-00771-4
work_keys_str_mv AT mostafaabbas machinelearningbasedrefineddifferentialgeneexpressionanalysisofpediatricsepsis
AT yasserelmanzalawy machinelearningbasedrefineddifferentialgeneexpressionanalysisofpediatricsepsis