TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes

In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architectur...

Full description

Bibliographic Details
Main Authors: Philippe eLeroy, Nicolas eGuilhot, Hiroaki eSakai, Aurélien eBernard, Frédéric eChoulet, Sébastien eTheil, Sébastien eReboux, Naoki eAmano, Timothée eFlutre, Céline ePelegrin, Hajime eOhyanagi, Michael eSeidel, Franck eGiacomoni, Matthieu eReichstadt, Michael eAlaux, Emmanuelle eGicquello, Fabrice eLegeai, Lorenzo eCerutti, Hisataka eNuma, Tsuyoshi eTanaka, Klaus eMayer, Takeshi eItoh, Hadi eQuesneville, Catherine eFeuillet
Format: Article
Language:English
Published: Frontiers Media S.A. 2012-01-01
Series:Frontiers in Plant Science
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fpls.2012.00005/full
_version_ 1828256558095532032
author Philippe eLeroy
Nicolas eGuilhot
Hiroaki eSakai
Aurélien eBernard
Frédéric eChoulet
Sébastien eTheil
Sébastien eReboux
Naoki eAmano
Timothée eFlutre
Céline ePelegrin
Hajime eOhyanagi
Michael eSeidel
Franck eGiacomoni
Matthieu eReichstadt
Michael eAlaux
Emmanuelle eGicquello
Fabrice eLegeai
Lorenzo eCerutti
Hisataka eNuma
Tsuyoshi eTanaka
Klaus eMayer
Takeshi eItoh
Hadi eQuesneville
Catherine eFeuillet
author_facet Philippe eLeroy
Nicolas eGuilhot
Hiroaki eSakai
Aurélien eBernard
Frédéric eChoulet
Sébastien eTheil
Sébastien eReboux
Naoki eAmano
Timothée eFlutre
Céline ePelegrin
Hajime eOhyanagi
Michael eSeidel
Franck eGiacomoni
Matthieu eReichstadt
Michael eAlaux
Emmanuelle eGicquello
Fabrice eLegeai
Lorenzo eCerutti
Hisataka eNuma
Tsuyoshi eTanaka
Klaus eMayer
Takeshi eItoh
Hadi eQuesneville
Catherine eFeuillet
author_sort Philippe eLeroy
collection DOAJ
description In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1 Gb sequence annotation in less than five days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 hours, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future.
first_indexed 2024-04-13T02:29:45Z
format Article
id doaj.art-cd9f4da0a885464c9647ee22b33a390d
institution Directory Open Access Journal
issn 1664-462X
language English
last_indexed 2024-04-13T02:29:45Z
publishDate 2012-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Plant Science
spelling doaj.art-cd9f4da0a885464c9647ee22b33a390d2022-12-22T03:06:37ZengFrontiers Media S.A.Frontiers in Plant Science1664-462X2012-01-01310.3389/fpls.2012.0000516821TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomesPhilippe eLeroy0Nicolas eGuilhot1Hiroaki eSakai2Aurélien eBernard3Frédéric eChoulet4Sébastien eTheil5Sébastien eReboux6Naoki eAmano7Timothée eFlutre8Céline ePelegrin9Hajime eOhyanagi10Michael eSeidel11Franck eGiacomoni12Matthieu eReichstadt13Michael eAlaux14Emmanuelle eGicquello15Fabrice eLegeai16Lorenzo eCerutti17Hisataka eNuma18Tsuyoshi eTanaka19Klaus eMayer20Takeshi eItoh21Hadi eQuesneville22Catherine eFeuillet23Institut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueNational Institute of Agrobiological SciencesInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueNational Institute of Agrobiological SciencesInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueMitsubishi Space Software Co., Ltd.MIPS/IBISInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueSwiss Institute of BioinformaticsNational Institute of Agrobiological SciencesNational Institute of Agrobiological SciencesMIPS/IBISNational Institute of Agrobiological SciencesInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueIn support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1 Gb sequence annotation in less than five days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 hours, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future.http://journal.frontiersin.org/Journal/10.3389/fpls.2012.00005/fullgeneClustertransposable elementswheatpipelineplant genomes
spellingShingle Philippe eLeroy
Nicolas eGuilhot
Hiroaki eSakai
Aurélien eBernard
Frédéric eChoulet
Sébastien eTheil
Sébastien eReboux
Naoki eAmano
Timothée eFlutre
Céline ePelegrin
Hajime eOhyanagi
Michael eSeidel
Franck eGiacomoni
Matthieu eReichstadt
Michael eAlaux
Emmanuelle eGicquello
Fabrice eLegeai
Lorenzo eCerutti
Hisataka eNuma
Tsuyoshi eTanaka
Klaus eMayer
Takeshi eItoh
Hadi eQuesneville
Catherine eFeuillet
TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes
Frontiers in Plant Science
gene
Cluster
transposable elements
wheat
pipeline
plant genomes
title TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes
title_full TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes
title_fullStr TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes
title_full_unstemmed TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes
title_short TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes
title_sort triannot a versatile and high performance pipeline for the automated annotation of plant genomes
topic gene
Cluster
transposable elements
wheat
pipeline
plant genomes
url http://journal.frontiersin.org/Journal/10.3389/fpls.2012.00005/full
work_keys_str_mv AT philippeeleroy triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT nicolaseguilhot triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT hiroakiesakai triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT aurelienebernard triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT fredericechoulet triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT sebastienetheil triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT sebastienereboux triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT naokieamano triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT timotheeeflutre triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT celineepelegrin triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT hajimeeohyanagi triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT michaeleseidel triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT franckegiacomoni triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT matthieuereichstadt triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT michaelealaux triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT emmanuelleegicquello triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT fabriceelegeai triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT lorenzoecerutti triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT hisatakaenuma triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT tsuyoshietanaka triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT klausemayer triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT takeshieitoh triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT hadiequesneville triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT catherineefeuillet triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes