Improved transcriptome assembly using a hybrid of long and short reads with StringTie.
Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2022-06-01
|
Series: | PLoS Computational Biology |
Online Access: | https://doi.org/10.1371/journal.pcbi.1009730 |
_version_ | 1797962574544764928 |
---|---|
author | Alaina Shumate Brandon Wong Geo Pertea Mihaela Pertea |
author_facet | Alaina Shumate Brandon Wong Geo Pertea Mihaela Pertea |
author_sort | Alaina Shumate |
collection | DOAJ |
description | Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie. |
first_indexed | 2024-04-11T01:15:18Z |
format | Article |
id | doaj.art-77912f9974784c0eb69d209fba1c6910 |
institution | Directory Open Access Journal |
issn | 1553-734X 1553-7358 |
language | English |
last_indexed | 2024-04-11T01:15:18Z |
publishDate | 2022-06-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS Computational Biology |
spelling | doaj.art-77912f9974784c0eb69d209fba1c69102023-01-04T05:30:50ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-06-01186e100973010.1371/journal.pcbi.1009730Improved transcriptome assembly using a hybrid of long and short reads with StringTie.Alaina ShumateBrandon WongGeo PerteaMihaela PerteaShort-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.https://doi.org/10.1371/journal.pcbi.1009730 |
spellingShingle | Alaina Shumate Brandon Wong Geo Pertea Mihaela Pertea Improved transcriptome assembly using a hybrid of long and short reads with StringTie. PLoS Computational Biology |
title | Improved transcriptome assembly using a hybrid of long and short reads with StringTie. |
title_full | Improved transcriptome assembly using a hybrid of long and short reads with StringTie. |
title_fullStr | Improved transcriptome assembly using a hybrid of long and short reads with StringTie. |
title_full_unstemmed | Improved transcriptome assembly using a hybrid of long and short reads with StringTie. |
title_short | Improved transcriptome assembly using a hybrid of long and short reads with StringTie. |
title_sort | improved transcriptome assembly using a hybrid of long and short reads with stringtie |
url | https://doi.org/10.1371/journal.pcbi.1009730 |
work_keys_str_mv | AT alainashumate improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie AT brandonwong improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie AT geopertea improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie AT mihaelapertea improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie |