Bayesian Correlation Analysis for Sequence Count Data.

Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. Th...

Full description

Bibliographic Details
Main Authors:	Daniel Sánchez-Taltavull, Parameswaran Ramachandran, Nelson Lau, Theodore J Perkins
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2016-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC5049778?pdf=render

_version_	1811206357514190848
author	Daniel Sánchez-Taltavull Parameswaran Ramachandran Nelson Lau Theodore J Perkins
author_facet	Daniel Sánchez-Taltavull Parameswaran Ramachandran Nelson Lau Theodore J Perkins
author_sort	Daniel Sánchez-Taltavull
collection	DOAJ
description	Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.
first_indexed	2024-04-12T03:46:06Z
format	Article
id	doaj.art-24019a6a3a564497b88e66d2efc34fb2
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-04-12T03:46:06Z
publishDate	2016-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-24019a6a3a564497b88e66d2efc34fb22022-12-22T03:49:08ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-011110e016359510.1371/journal.pone.0163595Bayesian Correlation Analysis for Sequence Count Data.Daniel Sánchez-TaltavullParameswaran RamachandranNelson LauTheodore J PerkinsEvaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.http://europepmc.org/articles/PMC5049778?pdf=render
spellingShingle	Daniel Sánchez-Taltavull Parameswaran Ramachandran Nelson Lau Theodore J Perkins Bayesian Correlation Analysis for Sequence Count Data. PLoS ONE
title	Bayesian Correlation Analysis for Sequence Count Data.
title_full	Bayesian Correlation Analysis for Sequence Count Data.
title_fullStr	Bayesian Correlation Analysis for Sequence Count Data.
title_full_unstemmed	Bayesian Correlation Analysis for Sequence Count Data.
title_short	Bayesian Correlation Analysis for Sequence Count Data.
title_sort	bayesian correlation analysis for sequence count data
url	http://europepmc.org/articles/PMC5049778?pdf=render
work_keys_str_mv	AT danielsancheztaltavull bayesiancorrelationanalysisforsequencecountdata AT parameswaranramachandran bayesiancorrelationanalysisforsequencecountdata AT nelsonlau bayesiancorrelationanalysisforsequencecountdata AT theodorejperkins bayesiancorrelationanalysisforsequencecountdata

Bayesian Correlation Analysis for Sequence Count Data.

Similar Items