Local sequence and sequencing depth dependent accuracy of RNA-seq reads
Abstract Background Many biases and spurious effects are inherent in RNA-seq technology, resulting in a non-uniform distribution of sequencing read counts for each base position in a gene. Therefore, a base-level strategy is required to model the non-uniformity. Also, the properties of sequencing re...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2017-08-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-017-1780-z |
_version_ | 1818276522103930880 |
---|---|
author | Guoshuai Cai Shoudan Liang Xiaofeng Zheng Feifei Xiao |
author_facet | Guoshuai Cai Shoudan Liang Xiaofeng Zheng Feifei Xiao |
author_sort | Guoshuai Cai |
collection | DOAJ |
description | Abstract Background Many biases and spurious effects are inherent in RNA-seq technology, resulting in a non-uniform distribution of sequencing read counts for each base position in a gene. Therefore, a base-level strategy is required to model the non-uniformity. Also, the properties of sequencing read counts can be leveraged to achieve a more precise estimation of the mean and variance of measurement. Results In this study, we aimed to unveil the effects on RNA-seq accuracy from multiple factors and develop accurate modeling of RNA-seq reads in comparison. We found that the overdispersion rate decreased when sequencing depth increased on the base level. Moreover, the influence of local sequence(s) on the overdispersion rate was notable but no longer significant after adjusting the effect from sequencing depth. Based on these findings, we propose a desirable beta-binomial model with a dynamic overdispersion rate on the base-level proportion of sequencing read counts from two samples. Conclusions The current study provides thorough insights into the impact of overdispersion at the position level and especially into its relationship with sequencing depth, local sequence, and preparation protocol. These properties of RNA-seq will aid in improvement of the quality control procedure and development of statistical methods for RNA-seq downstream analyses. |
first_indexed | 2024-12-12T22:46:59Z |
format | Article |
id | doaj.art-d039be90eb2446978d22aafd04c6b2bb |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-12T22:46:59Z |
publishDate | 2017-08-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-d039be90eb2446978d22aafd04c6b2bb2022-12-22T00:09:10ZengBMCBMC Bioinformatics1471-21052017-08-0118111210.1186/s12859-017-1780-zLocal sequence and sequencing depth dependent accuracy of RNA-seq readsGuoshuai Cai0Shoudan Liang1Xiaofeng Zheng2Feifei Xiao3Department of Molecular and Systems Biology, Geisel School of Medicine at DartmouthDepartment of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer CenterDepartment of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer CenterDepartment of Epidemiology and Biostatistics, Arnold School of Public Health, University of South CarolinaAbstract Background Many biases and spurious effects are inherent in RNA-seq technology, resulting in a non-uniform distribution of sequencing read counts for each base position in a gene. Therefore, a base-level strategy is required to model the non-uniformity. Also, the properties of sequencing read counts can be leveraged to achieve a more precise estimation of the mean and variance of measurement. Results In this study, we aimed to unveil the effects on RNA-seq accuracy from multiple factors and develop accurate modeling of RNA-seq reads in comparison. We found that the overdispersion rate decreased when sequencing depth increased on the base level. Moreover, the influence of local sequence(s) on the overdispersion rate was notable but no longer significant after adjusting the effect from sequencing depth. Based on these findings, we propose a desirable beta-binomial model with a dynamic overdispersion rate on the base-level proportion of sequencing read counts from two samples. Conclusions The current study provides thorough insights into the impact of overdispersion at the position level and especially into its relationship with sequencing depth, local sequence, and preparation protocol. These properties of RNA-seq will aid in improvement of the quality control procedure and development of statistical methods for RNA-seq downstream analyses.http://link.springer.com/article/10.1186/s12859-017-1780-zRNA-seqNon-uniformityBiasBase-level modelingOverdispersionBeta-binomial |
spellingShingle | Guoshuai Cai Shoudan Liang Xiaofeng Zheng Feifei Xiao Local sequence and sequencing depth dependent accuracy of RNA-seq reads BMC Bioinformatics RNA-seq Non-uniformity Bias Base-level modeling Overdispersion Beta-binomial |
title | Local sequence and sequencing depth dependent accuracy of RNA-seq reads |
title_full | Local sequence and sequencing depth dependent accuracy of RNA-seq reads |
title_fullStr | Local sequence and sequencing depth dependent accuracy of RNA-seq reads |
title_full_unstemmed | Local sequence and sequencing depth dependent accuracy of RNA-seq reads |
title_short | Local sequence and sequencing depth dependent accuracy of RNA-seq reads |
title_sort | local sequence and sequencing depth dependent accuracy of rna seq reads |
topic | RNA-seq Non-uniformity Bias Base-level modeling Overdispersion Beta-binomial |
url | http://link.springer.com/article/10.1186/s12859-017-1780-z |
work_keys_str_mv | AT guoshuaicai localsequenceandsequencingdepthdependentaccuracyofrnaseqreads AT shoudanliang localsequenceandsequencingdepthdependentaccuracyofrnaseqreads AT xiaofengzheng localsequenceandsequencingdepthdependentaccuracyofrnaseqreads AT feifeixiao localsequenceandsequencingdepthdependentaccuracyofrnaseqreads |