GTax: improving de novo transcriptome assembly by removing foreign RNA contamination

Abstract The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we p...

Full description

Bibliographic Details
Main Authors: Roberto Vera Alvarez, David Landsman
Format: Article
Language:English
Published: BMC 2024-01-01
Series:Genome Biology
Online Access:https://doi.org/10.1186/s13059-023-03141-2
_version_ 1827382116963319808
author Roberto Vera Alvarez
David Landsman
author_facet Roberto Vera Alvarez
David Landsman
author_sort Roberto Vera Alvarez
collection DOAJ
description Abstract The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we present GTax, a taxonomy-structured database of genomic sequences that can be used with BLAST to detect and remove foreign contamination in RNA sequencing samples before assembly. In addition, we use a de novo transcriptome assembly of Solanum lycopersicum (tomato) to demonstrate that removing foreign contamination in sequencing samples reduces the number of assembled chimeric transcripts.
first_indexed 2024-03-08T14:14:54Z
format Article
id doaj.art-2c915c63d4ab493b8cc65bce4cdf2ca6
institution Directory Open Access Journal
issn 1474-760X
language English
last_indexed 2024-03-08T14:14:54Z
publishDate 2024-01-01
publisher BMC
record_format Article
series Genome Biology
spelling doaj.art-2c915c63d4ab493b8cc65bce4cdf2ca62024-01-14T12:25:08ZengBMCGenome Biology1474-760X2024-01-0125112110.1186/s13059-023-03141-2GTax: improving de novo transcriptome assembly by removing foreign RNA contaminationRoberto Vera Alvarez0David Landsman1Computational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIHComputational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIHAbstract The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we present GTax, a taxonomy-structured database of genomic sequences that can be used with BLAST to detect and remove foreign contamination in RNA sequencing samples before assembly. In addition, we use a de novo transcriptome assembly of Solanum lycopersicum (tomato) to demonstrate that removing foreign contamination in sequencing samples reduces the number of assembled chimeric transcripts.https://doi.org/10.1186/s13059-023-03141-2
spellingShingle Roberto Vera Alvarez
David Landsman
GTax: improving de novo transcriptome assembly by removing foreign RNA contamination
Genome Biology
title GTax: improving de novo transcriptome assembly by removing foreign RNA contamination
title_full GTax: improving de novo transcriptome assembly by removing foreign RNA contamination
title_fullStr GTax: improving de novo transcriptome assembly by removing foreign RNA contamination
title_full_unstemmed GTax: improving de novo transcriptome assembly by removing foreign RNA contamination
title_short GTax: improving de novo transcriptome assembly by removing foreign RNA contamination
title_sort gtax improving de novo transcriptome assembly by removing foreign rna contamination
url https://doi.org/10.1186/s13059-023-03141-2
work_keys_str_mv AT robertoveraalvarez gtaximprovingdenovotranscriptomeassemblybyremovingforeignrnacontamination
AT davidlandsman gtaximprovingdenovotranscriptomeassemblybyremovingforeignrnacontamination