MGMR: leveraging RNA-Seq population data to optimize expression estimation

<p>Abstract</p> <p>Background</p> <p>RNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in t...

Full description

Bibliographic Details
Main Authors: Rozov Roye, Halperin Eran, Shamir Ron
Format: Article
Language:English
Published: BMC 2012-04-01
Series:BMC Bioinformatics
_version_ 1819040800463388672
author Rozov Roye
Halperin Eran
Shamir Ron
author_facet Rozov Roye
Halperin Eran
Shamir Ron
author_sort Rozov Roye
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>RNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in the genome (multireads). Previous efforts to resolve multireads have shown that RNA-Seq expression estimation can be improved using probabilistic allocation of reads to genes. These methods use a probabilistic generative model for data generation and resolve ambiguity using likelihood-based approaches. In many instances, RNA-seq experiments are performed in the context of a population. The generative models of current methods do not take into account such population information, and it is an open question whether this information can improve quantification of the individual samples</p> <p>Results</p> <p>In order to explore the contribution of population level information in RNA-seq quantification, we apply a hierarchical probabilistic generative model, which assumes that expression levels of different individuals are sampled from a Dirichlet distribution with parameters specific to the population, and reads are sampled from the distribution of expression levels. We introduce an optimization procedure for the estimation of the model parameters, and use HapMap data and simulated data to demonstrate that the model yields a significant improvement in the accuracy of expression levels of paralogous genes.</p> <p>Conclusions</p> <p>We provide a proof of principal of the benefit of drawing on population commonalities to estimate expression. The results of our experiments demonstrate this approach can be beneficial, primarily for estimation at the gene level.</p>
first_indexed 2024-12-21T09:14:51Z
format Article
id doaj.art-c9702c0ddf9949df90dec430c9220edc
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-21T09:14:51Z
publishDate 2012-04-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-c9702c0ddf9949df90dec430c9220edc2022-12-21T19:09:09ZengBMCBMC Bioinformatics1471-21052012-04-0113Suppl 6S210.1186/1471-2105-13-S6-S2MGMR: leveraging RNA-Seq population data to optimize expression estimationRozov RoyeHalperin EranShamir Ron<p>Abstract</p> <p>Background</p> <p>RNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in the genome (multireads). Previous efforts to resolve multireads have shown that RNA-Seq expression estimation can be improved using probabilistic allocation of reads to genes. These methods use a probabilistic generative model for data generation and resolve ambiguity using likelihood-based approaches. In many instances, RNA-seq experiments are performed in the context of a population. The generative models of current methods do not take into account such population information, and it is an open question whether this information can improve quantification of the individual samples</p> <p>Results</p> <p>In order to explore the contribution of population level information in RNA-seq quantification, we apply a hierarchical probabilistic generative model, which assumes that expression levels of different individuals are sampled from a Dirichlet distribution with parameters specific to the population, and reads are sampled from the distribution of expression levels. We introduce an optimization procedure for the estimation of the model parameters, and use HapMap data and simulated data to demonstrate that the model yields a significant improvement in the accuracy of expression levels of paralogous genes.</p> <p>Conclusions</p> <p>We provide a proof of principal of the benefit of drawing on population commonalities to estimate expression. The results of our experiments demonstrate this approach can be beneficial, primarily for estimation at the gene level.</p>
spellingShingle Rozov Roye
Halperin Eran
Shamir Ron
MGMR: leveraging RNA-Seq population data to optimize expression estimation
BMC Bioinformatics
title MGMR: leveraging RNA-Seq population data to optimize expression estimation
title_full MGMR: leveraging RNA-Seq population data to optimize expression estimation
title_fullStr MGMR: leveraging RNA-Seq population data to optimize expression estimation
title_full_unstemmed MGMR: leveraging RNA-Seq population data to optimize expression estimation
title_short MGMR: leveraging RNA-Seq population data to optimize expression estimation
title_sort mgmr leveraging rna seq population data to optimize expression estimation
work_keys_str_mv AT rozovroye mgmrleveragingrnaseqpopulationdatatooptimizeexpressionestimation
AT halperineran mgmrleveragingrnaseqpopulationdatatooptimizeexpressionestimation
AT shamirron mgmrleveragingrnaseqpopulationdatatooptimizeexpressionestimation