Improved transcriptome assembly using a hybrid of long and short reads with StringTie.

Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate...

Full description

Bibliographic Details
Main Authors: Alaina Shumate, Brandon Wong, Geo Pertea, Mihaela Pertea
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-06-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1009730
_version_ 1797962574544764928
author Alaina Shumate
Brandon Wong
Geo Pertea
Mihaela Pertea
author_facet Alaina Shumate
Brandon Wong
Geo Pertea
Mihaela Pertea
author_sort Alaina Shumate
collection DOAJ
description Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.
first_indexed 2024-04-11T01:15:18Z
format Article
id doaj.art-77912f9974784c0eb69d209fba1c6910
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-04-11T01:15:18Z
publishDate 2022-06-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-77912f9974784c0eb69d209fba1c69102023-01-04T05:30:50ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-06-01186e100973010.1371/journal.pcbi.1009730Improved transcriptome assembly using a hybrid of long and short reads with StringTie.Alaina ShumateBrandon WongGeo PerteaMihaela PerteaShort-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.https://doi.org/10.1371/journal.pcbi.1009730
spellingShingle Alaina Shumate
Brandon Wong
Geo Pertea
Mihaela Pertea
Improved transcriptome assembly using a hybrid of long and short reads with StringTie.
PLoS Computational Biology
title Improved transcriptome assembly using a hybrid of long and short reads with StringTie.
title_full Improved transcriptome assembly using a hybrid of long and short reads with StringTie.
title_fullStr Improved transcriptome assembly using a hybrid of long and short reads with StringTie.
title_full_unstemmed Improved transcriptome assembly using a hybrid of long and short reads with StringTie.
title_short Improved transcriptome assembly using a hybrid of long and short reads with StringTie.
title_sort improved transcriptome assembly using a hybrid of long and short reads with stringtie
url https://doi.org/10.1371/journal.pcbi.1009730
work_keys_str_mv AT alainashumate improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie
AT brandonwong improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie
AT geopertea improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie
AT mihaelapertea improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie