TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms
Abstract Background The advances in high-throughput sequencing technologies are allowing more and more de novo assembling of transcriptomes from many new organisms. Some degree of automation and evaluation is required to warrant reproducibility, repetitivity and the selection of the best possible tr...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2018-11-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-018-2384-y |
_version_ | 1828832460147785728 |
---|---|
author | Pedro Seoane Marina Espigares Rosario Carmona Álvaro Polonio Julia Quintana Enrico Cretazzo Josefina Bota Alejandro Pérez-García Juan de Dios Alché Luis Gómez M. Gonzalo Claros |
author_facet | Pedro Seoane Marina Espigares Rosario Carmona Álvaro Polonio Julia Quintana Enrico Cretazzo Josefina Bota Alejandro Pérez-García Juan de Dios Alché Luis Gómez M. Gonzalo Claros |
author_sort | Pedro Seoane |
collection | DOAJ |
description | Abstract Background The advances in high-throughput sequencing technologies are allowing more and more de novo assembling of transcriptomes from many new organisms. Some degree of automation and evaluation is required to warrant reproducibility, repetitivity and the selection of the best possible transcriptome. Workflows and pipelines are becoming an absolute requirement for such a purpose, but the issue of assembling evaluation for de novo transcriptomes in organisms lacking a sequenced genome remains unsolved. An automated, reproducible and flexible framework called TransFlow to accomplish this task is described. Results TransFlow with its five independent modules was designed to build different workflows depending on the nature of the original reads. This architecture enables different combinations of Illumina and Roche/454 sequencing data, and can be extended to other sequencing platforms. Its capabilities are illustrated with the selection of reliable plant reference transcriptomes and the assembling six transcriptomes (three case studies for grapevine leaves, olive tree pollen, and chestnut stem, and other three for haustorium, epiphytic structures and their combination for the phytopathogenic fungus Podosphaera xanthii). Arabidopsis and poplar transcriptomes revealed to be the best references. A common result regarding de novo assemblies is that Illumina paired-end reads of 100 nt in length assembled with OASES can provide reliable transcriptomes, while the contribution of longer reads is noticeable only when they complement a set of short, single-reads. Conclusions TransFlow can handle up to 181 different assembling strategies. Evaluation based on principal component analyses allows its self-adaptation to different sets of reads to provide a suitable transcriptome for each combination of reads and assemblers. As a result, each case study has its own behaviour, prioritises evaluation parameters, and gives an objective and automated way for detecting the best transcriptome within a pool of them. Sequencing data type and quantity (preferably several hundred millions of 2×100 nt or longer), assemblers (OASES for Illumina, MIRA4 and EULER-SR reconciled with CAP3 for Roche/454) and strategy (preferably scaffolding with OASES, and probably merging with Roche/454 when available) arise as the most impacting factors. |
first_indexed | 2024-12-12T16:54:35Z |
format | Article |
id | doaj.art-2459d4873f294f74ab85d064b04998fe |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-12T16:54:35Z |
publishDate | 2018-11-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-2459d4873f294f74ab85d064b04998fe2022-12-22T00:18:15ZengBMCBMC Bioinformatics1471-21052018-11-0119S149711410.1186/s12859-018-2384-yTransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organismsPedro Seoane0Marina Espigares1Rosario Carmona2Álvaro Polonio3Julia Quintana4Enrico Cretazzo5Josefina Bota6Alejandro Pérez-García7Juan de Dios Alché8Luis Gómez9M. Gonzalo Claros10Departmento de Biología Molecular y Bioquímica, Universidad de MálagaDepartmento de Biología Molecular y Bioquímica, Universidad de MálagaPlant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants. Estación Experimental del Zaidín. CSICDepartamento de Microbiología, and Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora”, Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC)Department of Chemistry and Biochemistry, Worcester Polytechnic InstituteInstituto Andaluz de Investigación y Formación Agraria (IFAPA), Centro de ChurrianaGrup de Recerca en Biologia de les Plantes en Condicions Mediterrànies, Departament de Biologia, Universitat de les Illes BalearsDepartamento de Microbiología, and Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora”, Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC)Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants. Estación Experimental del Zaidín. CSICDepartamento de Sistemas y Recursos Naturales, ETSI Forestal, de Montes y del Medio Natural, Universidad Politécnica de MadridDepartmento de Biología Molecular y Bioquímica, Universidad de MálagaAbstract Background The advances in high-throughput sequencing technologies are allowing more and more de novo assembling of transcriptomes from many new organisms. Some degree of automation and evaluation is required to warrant reproducibility, repetitivity and the selection of the best possible transcriptome. Workflows and pipelines are becoming an absolute requirement for such a purpose, but the issue of assembling evaluation for de novo transcriptomes in organisms lacking a sequenced genome remains unsolved. An automated, reproducible and flexible framework called TransFlow to accomplish this task is described. Results TransFlow with its five independent modules was designed to build different workflows depending on the nature of the original reads. This architecture enables different combinations of Illumina and Roche/454 sequencing data, and can be extended to other sequencing platforms. Its capabilities are illustrated with the selection of reliable plant reference transcriptomes and the assembling six transcriptomes (three case studies for grapevine leaves, olive tree pollen, and chestnut stem, and other three for haustorium, epiphytic structures and their combination for the phytopathogenic fungus Podosphaera xanthii). Arabidopsis and poplar transcriptomes revealed to be the best references. A common result regarding de novo assemblies is that Illumina paired-end reads of 100 nt in length assembled with OASES can provide reliable transcriptomes, while the contribution of longer reads is noticeable only when they complement a set of short, single-reads. Conclusions TransFlow can handle up to 181 different assembling strategies. Evaluation based on principal component analyses allows its self-adaptation to different sets of reads to provide a suitable transcriptome for each combination of reads and assemblers. As a result, each case study has its own behaviour, prioritises evaluation parameters, and gives an objective and automated way for detecting the best transcriptome within a pool of them. Sequencing data type and quantity (preferably several hundred millions of 2×100 nt or longer), assemblers (OASES for Illumina, MIRA4 and EULER-SR reconciled with CAP3 for Roche/454) and strategy (preferably scaffolding with OASES, and probably merging with Roche/454 when available) arise as the most impacting factors.http://link.springer.com/article/10.1186/s12859-018-2384-yTranscriptomeAssemblingWorkflowpipelinePCANon-model organism |
spellingShingle | Pedro Seoane Marina Espigares Rosario Carmona Álvaro Polonio Julia Quintana Enrico Cretazzo Josefina Bota Alejandro Pérez-García Juan de Dios Alché Luis Gómez M. Gonzalo Claros TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms BMC Bioinformatics Transcriptome Assembling Workflow pipeline PCA Non-model organism |
title | TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms |
title_full | TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms |
title_fullStr | TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms |
title_full_unstemmed | TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms |
title_short | TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms |
title_sort | transflow a modular framework for assembling and assessing accurate de novo transcriptomes in non model organisms |
topic | Transcriptome Assembling Workflow pipeline PCA Non-model organism |
url | http://link.springer.com/article/10.1186/s12859-018-2384-y |
work_keys_str_mv | AT pedroseoane transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms AT marinaespigares transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms AT rosariocarmona transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms AT alvaropolonio transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms AT juliaquintana transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms AT enricocretazzo transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms AT josefinabota transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms AT alejandroperezgarcia transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms AT juandediosalche transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms AT luisgomez transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms AT mgonzaloclaros transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms |