Local sequence and sequencing depth dependent accuracy of RNA-seq reads

Abstract Background Many biases and spurious effects are inherent in RNA-seq technology, resulting in a non-uniform distribution of sequencing read counts for each base position in a gene. Therefore, a base-level strategy is required to model the non-uniformity. Also, the properties of sequencing re...

Full description

Bibliographic Details
Main Authors: Guoshuai Cai, Shoudan Liang, Xiaofeng Zheng, Feifei Xiao
Format: Article
Language:English
Published: BMC 2017-08-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1780-z
_version_ 1818276522103930880
author Guoshuai Cai
Shoudan Liang
Xiaofeng Zheng
Feifei Xiao
author_facet Guoshuai Cai
Shoudan Liang
Xiaofeng Zheng
Feifei Xiao
author_sort Guoshuai Cai
collection DOAJ
description Abstract Background Many biases and spurious effects are inherent in RNA-seq technology, resulting in a non-uniform distribution of sequencing read counts for each base position in a gene. Therefore, a base-level strategy is required to model the non-uniformity. Also, the properties of sequencing read counts can be leveraged to achieve a more precise estimation of the mean and variance of measurement. Results In this study, we aimed to unveil the effects on RNA-seq accuracy from multiple factors and develop accurate modeling of RNA-seq reads in comparison. We found that the overdispersion rate decreased when sequencing depth increased on the base level. Moreover, the influence of local sequence(s) on the overdispersion rate was notable but no longer significant after adjusting the effect from sequencing depth. Based on these findings, we propose a desirable beta-binomial model with a dynamic overdispersion rate on the base-level proportion of sequencing read counts from two samples. Conclusions The current study provides thorough insights into the impact of overdispersion at the position level and especially into its relationship with sequencing depth, local sequence, and preparation protocol. These properties of RNA-seq will aid in improvement of the quality control procedure and development of statistical methods for RNA-seq downstream analyses.
first_indexed 2024-12-12T22:46:59Z
format Article
id doaj.art-d039be90eb2446978d22aafd04c6b2bb
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-12T22:46:59Z
publishDate 2017-08-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-d039be90eb2446978d22aafd04c6b2bb2022-12-22T00:09:10ZengBMCBMC Bioinformatics1471-21052017-08-0118111210.1186/s12859-017-1780-zLocal sequence and sequencing depth dependent accuracy of RNA-seq readsGuoshuai Cai0Shoudan Liang1Xiaofeng Zheng2Feifei Xiao3Department of Molecular and Systems Biology, Geisel School of Medicine at DartmouthDepartment of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer CenterDepartment of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer CenterDepartment of Epidemiology and Biostatistics, Arnold School of Public Health, University of South CarolinaAbstract Background Many biases and spurious effects are inherent in RNA-seq technology, resulting in a non-uniform distribution of sequencing read counts for each base position in a gene. Therefore, a base-level strategy is required to model the non-uniformity. Also, the properties of sequencing read counts can be leveraged to achieve a more precise estimation of the mean and variance of measurement. Results In this study, we aimed to unveil the effects on RNA-seq accuracy from multiple factors and develop accurate modeling of RNA-seq reads in comparison. We found that the overdispersion rate decreased when sequencing depth increased on the base level. Moreover, the influence of local sequence(s) on the overdispersion rate was notable but no longer significant after adjusting the effect from sequencing depth. Based on these findings, we propose a desirable beta-binomial model with a dynamic overdispersion rate on the base-level proportion of sequencing read counts from two samples. Conclusions The current study provides thorough insights into the impact of overdispersion at the position level and especially into its relationship with sequencing depth, local sequence, and preparation protocol. These properties of RNA-seq will aid in improvement of the quality control procedure and development of statistical methods for RNA-seq downstream analyses.http://link.springer.com/article/10.1186/s12859-017-1780-zRNA-seqNon-uniformityBiasBase-level modelingOverdispersionBeta-binomial
spellingShingle Guoshuai Cai
Shoudan Liang
Xiaofeng Zheng
Feifei Xiao
Local sequence and sequencing depth dependent accuracy of RNA-seq reads
BMC Bioinformatics
RNA-seq
Non-uniformity
Bias
Base-level modeling
Overdispersion
Beta-binomial
title Local sequence and sequencing depth dependent accuracy of RNA-seq reads
title_full Local sequence and sequencing depth dependent accuracy of RNA-seq reads
title_fullStr Local sequence and sequencing depth dependent accuracy of RNA-seq reads
title_full_unstemmed Local sequence and sequencing depth dependent accuracy of RNA-seq reads
title_short Local sequence and sequencing depth dependent accuracy of RNA-seq reads
title_sort local sequence and sequencing depth dependent accuracy of rna seq reads
topic RNA-seq
Non-uniformity
Bias
Base-level modeling
Overdispersion
Beta-binomial
url http://link.springer.com/article/10.1186/s12859-017-1780-z
work_keys_str_mv AT guoshuaicai localsequenceandsequencingdepthdependentaccuracyofrnaseqreads
AT shoudanliang localsequenceandsequencingdepthdependentaccuracyofrnaseqreads
AT xiaofengzheng localsequenceandsequencingdepthdependentaccuracyofrnaseqreads
AT feifeixiao localsequenceandsequencingdepthdependentaccuracyofrnaseqreads