Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks

Abstract Reconstruction of Gene Regulatory Networks (GRNs) of gene expression data with Probabilistic Network Models (PNMs) is an open problem. Gene expression datasets consist of thousand of genes with relatively small sample sizes (i.e. are large-p-small-n). Moreover, dependencies of various order...

Full description

Bibliographic Details
Main Authors: Catharina E. Graafland, José M. Gutiérrez
Format: Article
Language:English
Published: Nature Portfolio 2022-11-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-022-21957-z
_version_ 1828309968804118528
author Catharina E. Graafland
José M. Gutiérrez
author_facet Catharina E. Graafland
José M. Gutiérrez
author_sort Catharina E. Graafland
collection DOAJ
description Abstract Reconstruction of Gene Regulatory Networks (GRNs) of gene expression data with Probabilistic Network Models (PNMs) is an open problem. Gene expression datasets consist of thousand of genes with relatively small sample sizes (i.e. are large-p-small-n). Moreover, dependencies of various orders coexist in the datasets. On the one hand transcription factor encoding genes act like hubs and regulate target genes, on the other hand target genes show local dependencies. In the field of Undirected Network Models (UNMs)—a subclass of PNMs—the Glasso algorithm has been proposed to deal with high dimensional microarray datasets forcing sparsity. To overcome the problem of the complex structure of interactions, modifications of the default Glasso algorithm have been developed that integrate the expected dependency structure in the UNMs beforehand. In this work we advocate the use of a simple score-based Hill Climbing algorithm (HC) that learns Gaussian Bayesian networks leaning on directed acyclic graphs. We compare HC with Glasso and variants in the UNM framework based on their capability to reconstruct GRNs from microarray data from the benchmarking synthetic dataset from the DREAM5 challenge and from real-world data from the Escherichia coli genome. We conclude that dependencies in complex data are learned best by the HC algorithm, presenting them most accurately and efficiently, simultaneously modelling strong local and weaker but significant global connections coexisting in the gene expression dataset. The HC algorithm adapts intrinsically to the complex dependency structure of the dataset, without forcing a specific structure in advance.
first_indexed 2024-04-13T15:36:51Z
format Article
id doaj.art-942083521442428b818cceb28e88b060
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-13T15:36:51Z
publishDate 2022-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-942083521442428b818cceb28e88b0602022-12-22T02:41:15ZengNature PortfolioScientific Reports2045-23222022-11-0112111810.1038/s41598-022-21957-zLearning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networksCatharina E. Graafland0José M. Gutiérrez1Instituto de Física de Cantabria, CSIC-Universidad de CantabriaInstituto de Física de Cantabria, CSIC-Universidad de CantabriaAbstract Reconstruction of Gene Regulatory Networks (GRNs) of gene expression data with Probabilistic Network Models (PNMs) is an open problem. Gene expression datasets consist of thousand of genes with relatively small sample sizes (i.e. are large-p-small-n). Moreover, dependencies of various orders coexist in the datasets. On the one hand transcription factor encoding genes act like hubs and regulate target genes, on the other hand target genes show local dependencies. In the field of Undirected Network Models (UNMs)—a subclass of PNMs—the Glasso algorithm has been proposed to deal with high dimensional microarray datasets forcing sparsity. To overcome the problem of the complex structure of interactions, modifications of the default Glasso algorithm have been developed that integrate the expected dependency structure in the UNMs beforehand. In this work we advocate the use of a simple score-based Hill Climbing algorithm (HC) that learns Gaussian Bayesian networks leaning on directed acyclic graphs. We compare HC with Glasso and variants in the UNM framework based on their capability to reconstruct GRNs from microarray data from the benchmarking synthetic dataset from the DREAM5 challenge and from real-world data from the Escherichia coli genome. We conclude that dependencies in complex data are learned best by the HC algorithm, presenting them most accurately and efficiently, simultaneously modelling strong local and weaker but significant global connections coexisting in the gene expression dataset. The HC algorithm adapts intrinsically to the complex dependency structure of the dataset, without forcing a specific structure in advance.https://doi.org/10.1038/s41598-022-21957-z
spellingShingle Catharina E. Graafland
José M. Gutiérrez
Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
Scientific Reports
title Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
title_full Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
title_fullStr Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
title_full_unstemmed Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
title_short Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
title_sort learning complex dependency structure of gene regulatory networks from high dimensional microarray data with gaussian bayesian networks
url https://doi.org/10.1038/s41598-022-21957-z
work_keys_str_mv AT catharinaegraafland learningcomplexdependencystructureofgeneregulatorynetworksfromhighdimensionalmicroarraydatawithgaussianbayesiannetworks
AT josemgutierrez learningcomplexdependencystructureofgeneregulatorynetworksfromhighdimensionalmicroarraydatawithgaussianbayesiannetworks