Bayesian functional enrichment analysis for the Reactome database

The first step in the analysis of high-throughput experiment results is often to identify genes or proteins with certain characteristics, such as genes being differentially expressed (DE). To gain more insights into the underlying biology, functional enrichment analysis is then conducted to provide...

Full description

Bibliographic Details
Main Author: Jing Cao
Format: Article
Language:English
Published: Taylor & Francis Group 2017-07-01
Series:Statistical Theory and Related Fields
Subjects:
Online Access:http://dx.doi.org/10.1080/24754269.2017.1387444
_version_ 1797677088524730368
author Jing Cao
author_facet Jing Cao
author_sort Jing Cao
collection DOAJ
description The first step in the analysis of high-throughput experiment results is often to identify genes or proteins with certain characteristics, such as genes being differentially expressed (DE). To gain more insights into the underlying biology, functional enrichment analysis is then conducted to provide functional interpretation for the identified genes or proteins. The hypergeometric P value has been widely used to investigate whether genes from predefined functional terms, e.g., Reactome, are enriched in the DE genes. The hypergeometric P value has several limitations: (1) computed independently for each term, thus neglecting biological dependence; (2) subject to a size constraint that leads to the tendency of selecting less-specific terms. In this paper, a Bayesian approach is proposed to overcome these limitations by incorporating the interconnected dependence structure of biological functions in the Reactome database through a CAR prior in a Bayesian hierarchical logistic model. The inference on functional enrichment is then based on posterior probabilities that are immune to the size constraint. This method can detect moderate but consistent enrichment signals and identify sets of closely related and biologically meaningful functional terms rather than isolated terms. The performance of the Bayesian method is demonstrated via a simulation study and a real data application.
first_indexed 2024-03-11T22:39:59Z
format Article
id doaj.art-86983af83dc043c48600a80d0f4ecb9f
institution Directory Open Access Journal
issn 2475-4269
2475-4277
language English
last_indexed 2024-03-11T22:39:59Z
publishDate 2017-07-01
publisher Taylor & Francis Group
record_format Article
series Statistical Theory and Related Fields
spelling doaj.art-86983af83dc043c48600a80d0f4ecb9f2023-09-22T09:19:44ZengTaylor & Francis GroupStatistical Theory and Related Fields2475-42692475-42772017-07-011218519310.1080/24754269.2017.13874441387444Bayesian functional enrichment analysis for the Reactome databaseJing Cao0Southern Methodist UniversityThe first step in the analysis of high-throughput experiment results is often to identify genes or proteins with certain characteristics, such as genes being differentially expressed (DE). To gain more insights into the underlying biology, functional enrichment analysis is then conducted to provide functional interpretation for the identified genes or proteins. The hypergeometric P value has been widely used to investigate whether genes from predefined functional terms, e.g., Reactome, are enriched in the DE genes. The hypergeometric P value has several limitations: (1) computed independently for each term, thus neglecting biological dependence; (2) subject to a size constraint that leads to the tendency of selecting less-specific terms. In this paper, a Bayesian approach is proposed to overcome these limitations by incorporating the interconnected dependence structure of biological functions in the Reactome database through a CAR prior in a Bayesian hierarchical logistic model. The inference on functional enrichment is then based on posterior probabilities that are immune to the size constraint. This method can detect moderate but consistent enrichment signals and identify sets of closely related and biologically meaningful functional terms rather than isolated terms. The performance of the Bayesian method is demonstrated via a simulation study and a real data application.http://dx.doi.org/10.1080/24754269.2017.1387444functional enrichment analysisreactomehypergeometric p valuebayesian hierarchical logistic modelconditional autoregressive prior
spellingShingle Jing Cao
Bayesian functional enrichment analysis for the Reactome database
Statistical Theory and Related Fields
functional enrichment analysis
reactome
hypergeometric p value
bayesian hierarchical logistic model
conditional autoregressive prior
title Bayesian functional enrichment analysis for the Reactome database
title_full Bayesian functional enrichment analysis for the Reactome database
title_fullStr Bayesian functional enrichment analysis for the Reactome database
title_full_unstemmed Bayesian functional enrichment analysis for the Reactome database
title_short Bayesian functional enrichment analysis for the Reactome database
title_sort bayesian functional enrichment analysis for the reactome database
topic functional enrichment analysis
reactome
hypergeometric p value
bayesian hierarchical logistic model
conditional autoregressive prior
url http://dx.doi.org/10.1080/24754269.2017.1387444
work_keys_str_mv AT jingcao bayesianfunctionalenrichmentanalysisforthereactomedatabase