Bayesian functional enrichment analysis for the Reactome database
The first step in the analysis of high-throughput experiment results is often to identify genes or proteins with certain characteristics, such as genes being differentially expressed (DE). To gain more insights into the underlying biology, functional enrichment analysis is then conducted to provide...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2017-07-01
|
Series: | Statistical Theory and Related Fields |
Subjects: | |
Online Access: | http://dx.doi.org/10.1080/24754269.2017.1387444 |
_version_ | 1797677088524730368 |
---|---|
author | Jing Cao |
author_facet | Jing Cao |
author_sort | Jing Cao |
collection | DOAJ |
description | The first step in the analysis of high-throughput experiment results is often to identify genes or proteins with certain characteristics, such as genes being differentially expressed (DE). To gain more insights into the underlying biology, functional enrichment analysis is then conducted to provide functional interpretation for the identified genes or proteins. The hypergeometric P value has been widely used to investigate whether genes from predefined functional terms, e.g., Reactome, are enriched in the DE genes. The hypergeometric P value has several limitations: (1) computed independently for each term, thus neglecting biological dependence; (2) subject to a size constraint that leads to the tendency of selecting less-specific terms. In this paper, a Bayesian approach is proposed to overcome these limitations by incorporating the interconnected dependence structure of biological functions in the Reactome database through a CAR prior in a Bayesian hierarchical logistic model. The inference on functional enrichment is then based on posterior probabilities that are immune to the size constraint. This method can detect moderate but consistent enrichment signals and identify sets of closely related and biologically meaningful functional terms rather than isolated terms. The performance of the Bayesian method is demonstrated via a simulation study and a real data application. |
first_indexed | 2024-03-11T22:39:59Z |
format | Article |
id | doaj.art-86983af83dc043c48600a80d0f4ecb9f |
institution | Directory Open Access Journal |
issn | 2475-4269 2475-4277 |
language | English |
last_indexed | 2024-03-11T22:39:59Z |
publishDate | 2017-07-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Statistical Theory and Related Fields |
spelling | doaj.art-86983af83dc043c48600a80d0f4ecb9f2023-09-22T09:19:44ZengTaylor & Francis GroupStatistical Theory and Related Fields2475-42692475-42772017-07-011218519310.1080/24754269.2017.13874441387444Bayesian functional enrichment analysis for the Reactome databaseJing Cao0Southern Methodist UniversityThe first step in the analysis of high-throughput experiment results is often to identify genes or proteins with certain characteristics, such as genes being differentially expressed (DE). To gain more insights into the underlying biology, functional enrichment analysis is then conducted to provide functional interpretation for the identified genes or proteins. The hypergeometric P value has been widely used to investigate whether genes from predefined functional terms, e.g., Reactome, are enriched in the DE genes. The hypergeometric P value has several limitations: (1) computed independently for each term, thus neglecting biological dependence; (2) subject to a size constraint that leads to the tendency of selecting less-specific terms. In this paper, a Bayesian approach is proposed to overcome these limitations by incorporating the interconnected dependence structure of biological functions in the Reactome database through a CAR prior in a Bayesian hierarchical logistic model. The inference on functional enrichment is then based on posterior probabilities that are immune to the size constraint. This method can detect moderate but consistent enrichment signals and identify sets of closely related and biologically meaningful functional terms rather than isolated terms. The performance of the Bayesian method is demonstrated via a simulation study and a real data application.http://dx.doi.org/10.1080/24754269.2017.1387444functional enrichment analysisreactomehypergeometric p valuebayesian hierarchical logistic modelconditional autoregressive prior |
spellingShingle | Jing Cao Bayesian functional enrichment analysis for the Reactome database Statistical Theory and Related Fields functional enrichment analysis reactome hypergeometric p value bayesian hierarchical logistic model conditional autoregressive prior |
title | Bayesian functional enrichment analysis for the Reactome database |
title_full | Bayesian functional enrichment analysis for the Reactome database |
title_fullStr | Bayesian functional enrichment analysis for the Reactome database |
title_full_unstemmed | Bayesian functional enrichment analysis for the Reactome database |
title_short | Bayesian functional enrichment analysis for the Reactome database |
title_sort | bayesian functional enrichment analysis for the reactome database |
topic | functional enrichment analysis reactome hypergeometric p value bayesian hierarchical logistic model conditional autoregressive prior |
url | http://dx.doi.org/10.1080/24754269.2017.1387444 |
work_keys_str_mv | AT jingcao bayesianfunctionalenrichmentanalysisforthereactomedatabase |