Short paired-end reads trump long single-end reads for expression analysis

Abstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the addit...

Full description

Bibliographic Details
Main Authors: Adam H. Freedman, John M. Gaspar, Timothy B. Sackton
Format: Article
Language:English
Published: BMC 2020-04-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-3484-z
_version_ 1818881309373628416
author Adam H. Freedman
John M. Gaspar
Timothy B. Sackton
author_facet Adam H. Freedman
John M. Gaspar
Timothy B. Sackton
author_sort Adam H. Freedman
collection DOAJ
description Abstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. Results At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. Conclusion Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.
first_indexed 2024-12-19T14:59:49Z
format Article
id doaj.art-ec4b6f7fa9644b899767bee2b0221432
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-19T14:59:49Z
publishDate 2020-04-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-ec4b6f7fa9644b899767bee2b02214322022-12-21T20:16:36ZengBMCBMC Bioinformatics1471-21052020-04-0121111110.1186/s12859-020-3484-zShort paired-end reads trump long single-end reads for expression analysisAdam H. Freedman0John M. Gaspar1Timothy B. Sackton2Informatics Group, Harvard UniversityInformatics Group, Harvard UniversityInformatics Group, Harvard UniversityAbstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. Results At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. Conclusion Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.http://link.springer.com/article/10.1186/s12859-020-3484-zRNA-seqShort read sequencingDifferential expression
spellingShingle Adam H. Freedman
John M. Gaspar
Timothy B. Sackton
Short paired-end reads trump long single-end reads for expression analysis
BMC Bioinformatics
RNA-seq
Short read sequencing
Differential expression
title Short paired-end reads trump long single-end reads for expression analysis
title_full Short paired-end reads trump long single-end reads for expression analysis
title_fullStr Short paired-end reads trump long single-end reads for expression analysis
title_full_unstemmed Short paired-end reads trump long single-end reads for expression analysis
title_short Short paired-end reads trump long single-end reads for expression analysis
title_sort short paired end reads trump long single end reads for expression analysis
topic RNA-seq
Short read sequencing
Differential expression
url http://link.springer.com/article/10.1186/s12859-020-3484-z
work_keys_str_mv AT adamhfreedman shortpairedendreadstrumplongsingleendreadsforexpressionanalysis
AT johnmgaspar shortpairedendreadstrumplongsingleendreadsforexpressionanalysis
AT timothybsackton shortpairedendreadstrumplongsingleendreadsforexpressionanalysis