Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies

Background De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in sever...

Full description

Bibliographic Details
Main Authors: Cédric Cabau, Frédéric Escudié, Anis Djari, Yann Guiguen, Julien Bobe, Christophe Klopp
Format: Article
Language:English
Published: PeerJ Inc. 2017-02-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/2988.pdf
_version_ 1797418386076991488
author Cédric Cabau
Frédéric Escudié
Anis Djari
Yann Guiguen
Julien Bobe
Christophe Klopp
author_facet Cédric Cabau
Frédéric Escudié
Anis Djari
Yann Guiguen
Julien Bobe
Christophe Klopp
author_sort Cédric Cabau
collection DOAJ
description Background De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in several ways including redundancy reduction or error correction. Trinity and Oases are two commonly used de novo transcriptome assemblers. The contig sets they produce are of good quality. Still, their compaction (number of contigs needed to represent the transcriptome) and their quality (chimera and nucleotide error rates) can be improved. Results We built a de novo RNA-Seq Assembly Pipeline (DRAP) which wraps these two assemblers (Trinity and Oases) in order to improve their results regarding the above-mentioned criteria. DRAP reduces from 1.3 to 15 fold the number of resulting contigs of the assemblies depending on the read set and the assembler used. This article presents seven assembly comparisons showing in some cases drastic improvements when using DRAP. DRAP does not significantly impair assembly quality metrics such are read realignment rate or protein reconstruction counts. Conclusion Transcriptome assembly is a challenging computational task even if good solutions are already available to end-users, these solutions can still be improved while conserving the overall representation and quality of the assembly. The de novo RNA-Seq Assembly Pipeline (DRAP) is an easy to use software package to produce compact and corrected transcript set. DRAP is free, open-source and available under GPL V3 license at http://www.sigenae.org/drap.
first_indexed 2024-03-09T06:33:01Z
format Article
id doaj.art-6f05f9f61036480b81de3015a0d071a3
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:33:01Z
publishDate 2017-02-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-6f05f9f61036480b81de3015a0d071a32023-12-03T11:02:52ZengPeerJ Inc.PeerJ2167-83592017-02-015e298810.7717/peerj.2988Compacting and correcting Trinity and Oases RNA-Seq de novo assembliesCédric Cabau0Frédéric Escudié1Anis Djari2Yann Guiguen3Julien Bobe4Christophe Klopp5SIGENAE, GenPhySE, Université de Toulouse, INRA, INPT, ENV, Castanet Tolosan, FrancePlate-forme bio-informatique Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRA, Castanet Tolosan, FranceLaboratoire Génomique et Biotechnologie du Fruit, UMR990 INRA/INP-ENSAT, Auzeville, FranceUR1037 Fish Physiology and Genomics, INRA, Rennes, FranceUR1037 Fish Physiology and Genomics, INRA, Rennes, FranceSIGENAE, GenPhySE, Université de Toulouse, INRA, INPT, ENV, Castanet Tolosan, FranceBackground De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in several ways including redundancy reduction or error correction. Trinity and Oases are two commonly used de novo transcriptome assemblers. The contig sets they produce are of good quality. Still, their compaction (number of contigs needed to represent the transcriptome) and their quality (chimera and nucleotide error rates) can be improved. Results We built a de novo RNA-Seq Assembly Pipeline (DRAP) which wraps these two assemblers (Trinity and Oases) in order to improve their results regarding the above-mentioned criteria. DRAP reduces from 1.3 to 15 fold the number of resulting contigs of the assemblies depending on the read set and the assembler used. This article presents seven assembly comparisons showing in some cases drastic improvements when using DRAP. DRAP does not significantly impair assembly quality metrics such are read realignment rate or protein reconstruction counts. Conclusion Transcriptome assembly is a challenging computational task even if good solutions are already available to end-users, these solutions can still be improved while conserving the overall representation and quality of the assembly. The de novo RNA-Seq Assembly Pipeline (DRAP) is an easy to use software package to produce compact and corrected transcript set. DRAP is free, open-source and available under GPL V3 license at http://www.sigenae.org/drap.https://peerj.com/articles/2988.pdfRNA-SeqDe novo assemblyCompactionCorrectionQuality assessment
spellingShingle Cédric Cabau
Frédéric Escudié
Anis Djari
Yann Guiguen
Julien Bobe
Christophe Klopp
Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
PeerJ
RNA-Seq
De novo assembly
Compaction
Correction
Quality assessment
title Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
title_full Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
title_fullStr Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
title_full_unstemmed Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
title_short Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
title_sort compacting and correcting trinity and oases rna seq de novo assemblies
topic RNA-Seq
De novo assembly
Compaction
Correction
Quality assessment
url https://peerj.com/articles/2988.pdf
work_keys_str_mv AT cedriccabau compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies
AT fredericescudie compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies
AT anisdjari compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies
AT yannguiguen compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies
AT julienbobe compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies
AT christopheklopp compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies