TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes
In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architectur...
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2012-01-01
|
Series: | Frontiers in Plant Science |
Subjects: | |
Online Access: | http://journal.frontiersin.org/Journal/10.3389/fpls.2012.00005/full |
_version_ | 1828256558095532032 |
---|---|
author | Philippe eLeroy Nicolas eGuilhot Hiroaki eSakai Aurélien eBernard Frédéric eChoulet Sébastien eTheil Sébastien eReboux Naoki eAmano Timothée eFlutre Céline ePelegrin Hajime eOhyanagi Michael eSeidel Franck eGiacomoni Matthieu eReichstadt Michael eAlaux Emmanuelle eGicquello Fabrice eLegeai Lorenzo eCerutti Hisataka eNuma Tsuyoshi eTanaka Klaus eMayer Takeshi eItoh Hadi eQuesneville Catherine eFeuillet |
author_facet | Philippe eLeroy Nicolas eGuilhot Hiroaki eSakai Aurélien eBernard Frédéric eChoulet Sébastien eTheil Sébastien eReboux Naoki eAmano Timothée eFlutre Céline ePelegrin Hajime eOhyanagi Michael eSeidel Franck eGiacomoni Matthieu eReichstadt Michael eAlaux Emmanuelle eGicquello Fabrice eLegeai Lorenzo eCerutti Hisataka eNuma Tsuyoshi eTanaka Klaus eMayer Takeshi eItoh Hadi eQuesneville Catherine eFeuillet |
author_sort | Philippe eLeroy |
collection | DOAJ |
description | In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1 Gb sequence annotation in less than five days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 hours, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future. |
first_indexed | 2024-04-13T02:29:45Z |
format | Article |
id | doaj.art-cd9f4da0a885464c9647ee22b33a390d |
institution | Directory Open Access Journal |
issn | 1664-462X |
language | English |
last_indexed | 2024-04-13T02:29:45Z |
publishDate | 2012-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Plant Science |
spelling | doaj.art-cd9f4da0a885464c9647ee22b33a390d2022-12-22T03:06:37ZengFrontiers Media S.A.Frontiers in Plant Science1664-462X2012-01-01310.3389/fpls.2012.0000516821TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomesPhilippe eLeroy0Nicolas eGuilhot1Hiroaki eSakai2Aurélien eBernard3Frédéric eChoulet4Sébastien eTheil5Sébastien eReboux6Naoki eAmano7Timothée eFlutre8Céline ePelegrin9Hajime eOhyanagi10Michael eSeidel11Franck eGiacomoni12Matthieu eReichstadt13Michael eAlaux14Emmanuelle eGicquello15Fabrice eLegeai16Lorenzo eCerutti17Hisataka eNuma18Tsuyoshi eTanaka19Klaus eMayer20Takeshi eItoh21Hadi eQuesneville22Catherine eFeuillet23Institut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueNational Institute of Agrobiological SciencesInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueNational Institute of Agrobiological SciencesInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueMitsubishi Space Software Co., Ltd.MIPS/IBISInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueSwiss Institute of BioinformaticsNational Institute of Agrobiological SciencesNational Institute of Agrobiological SciencesMIPS/IBISNational Institute of Agrobiological SciencesInstitut National de la Recherche AgronomiqueInstitut National de la Recherche AgronomiqueIn support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1 Gb sequence annotation in less than five days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 hours, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future.http://journal.frontiersin.org/Journal/10.3389/fpls.2012.00005/fullgeneClustertransposable elementswheatpipelineplant genomes |
spellingShingle | Philippe eLeroy Nicolas eGuilhot Hiroaki eSakai Aurélien eBernard Frédéric eChoulet Sébastien eTheil Sébastien eReboux Naoki eAmano Timothée eFlutre Céline ePelegrin Hajime eOhyanagi Michael eSeidel Franck eGiacomoni Matthieu eReichstadt Michael eAlaux Emmanuelle eGicquello Fabrice eLegeai Lorenzo eCerutti Hisataka eNuma Tsuyoshi eTanaka Klaus eMayer Takeshi eItoh Hadi eQuesneville Catherine eFeuillet TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes Frontiers in Plant Science gene Cluster transposable elements wheat pipeline plant genomes |
title | TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes |
title_full | TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes |
title_fullStr | TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes |
title_full_unstemmed | TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes |
title_short | TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes |
title_sort | triannot a versatile and high performance pipeline for the automated annotation of plant genomes |
topic | gene Cluster transposable elements wheat pipeline plant genomes |
url | http://journal.frontiersin.org/Journal/10.3389/fpls.2012.00005/full |
work_keys_str_mv | AT philippeeleroy triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT nicolaseguilhot triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT hiroakiesakai triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT aurelienebernard triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT fredericechoulet triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT sebastienetheil triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT sebastienereboux triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT naokieamano triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT timotheeeflutre triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT celineepelegrin triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT hajimeeohyanagi triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT michaeleseidel triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT franckegiacomoni triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT matthieuereichstadt triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT michaelealaux triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT emmanuelleegicquello triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT fabriceelegeai triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT lorenzoecerutti triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT hisatakaenuma triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT tsuyoshietanaka triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT klausemayer triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT takeshieitoh triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT hadiequesneville triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes AT catherineefeuillet triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes |