Short paired-end reads trump long single-end reads for expression analysis
Abstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the addit...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-04-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-020-3484-z |
_version_ | 1818881309373628416 |
---|---|
author | Adam H. Freedman John M. Gaspar Timothy B. Sackton |
author_facet | Adam H. Freedman John M. Gaspar Timothy B. Sackton |
author_sort | Adam H. Freedman |
collection | DOAJ |
description | Abstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. Results At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. Conclusion Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level. |
first_indexed | 2024-12-19T14:59:49Z |
format | Article |
id | doaj.art-ec4b6f7fa9644b899767bee2b0221432 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-19T14:59:49Z |
publishDate | 2020-04-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-ec4b6f7fa9644b899767bee2b02214322022-12-21T20:16:36ZengBMCBMC Bioinformatics1471-21052020-04-0121111110.1186/s12859-020-3484-zShort paired-end reads trump long single-end reads for expression analysisAdam H. Freedman0John M. Gaspar1Timothy B. Sackton2Informatics Group, Harvard UniversityInformatics Group, Harvard UniversityInformatics Group, Harvard UniversityAbstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. Results At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. Conclusion Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.http://link.springer.com/article/10.1186/s12859-020-3484-zRNA-seqShort read sequencingDifferential expression |
spellingShingle | Adam H. Freedman John M. Gaspar Timothy B. Sackton Short paired-end reads trump long single-end reads for expression analysis BMC Bioinformatics RNA-seq Short read sequencing Differential expression |
title | Short paired-end reads trump long single-end reads for expression analysis |
title_full | Short paired-end reads trump long single-end reads for expression analysis |
title_fullStr | Short paired-end reads trump long single-end reads for expression analysis |
title_full_unstemmed | Short paired-end reads trump long single-end reads for expression analysis |
title_short | Short paired-end reads trump long single-end reads for expression analysis |
title_sort | short paired end reads trump long single end reads for expression analysis |
topic | RNA-seq Short read sequencing Differential expression |
url | http://link.springer.com/article/10.1186/s12859-020-3484-z |
work_keys_str_mv | AT adamhfreedman shortpairedendreadstrumplongsingleendreadsforexpressionanalysis AT johnmgaspar shortpairedendreadstrumplongsingleendreadsforexpressionanalysis AT timothybsackton shortpairedendreadstrumplongsingleendreadsforexpressionanalysis |