COGNATE: comparative gene annotation characterizer

Abstract Background The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:...

Full description

Bibliographic Details
Main Authors: Jeanne Wilbrandt, Bernhard Misof, Oliver Niehuis
Format: Article
Language:English
Published: BMC 2017-07-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-017-3870-8
_version_ 1819212181526282240
author Jeanne Wilbrandt
Bernhard Misof
Oliver Niehuis
author_facet Jeanne Wilbrandt
Bernhard Misof
Oliver Niehuis
author_sort Jeanne Wilbrandt
collection DOAJ
description Abstract Background The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool. Results We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage ( https://www.zfmk.de/en/COGNATE ) and on github ( https://github.com/ZFMK/COGNATE ). Conclusion The tool COGNATE allows comparing genome assemblies and structural elements on multiples levels (e.g., scaffold or contig sequence, gene). It clearly enhances comparability between analyses. Thus, COGNATE can provide the important standardization of both genome and gene structure parameter disclosure as well as data acquisition for future comparative analyses. With the establishment of comprehensive descriptive standards and the extensive availability of genomes, an encompassing database will become possible.
first_indexed 2024-12-23T06:38:53Z
format Article
id doaj.art-d3221ff7e78f4ace90b7d9bf1cae008c
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-23T06:38:53Z
publishDate 2017-07-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-d3221ff7e78f4ace90b7d9bf1cae008c2022-12-21T17:56:43ZengBMCBMC Genomics1471-21642017-07-0118111010.1186/s12864-017-3870-8COGNATE: comparative gene annotation characterizerJeanne Wilbrandt0Bernhard Misof1Oliver Niehuis2Zoologisches Forschungsmuseum Alexander Koenig (ZFMK), Zentrum für Molekulare Biodiversitätsforschung (zmb)Zoologisches Forschungsmuseum Alexander Koenig (ZFMK), Zentrum für Molekulare Biodiversitätsforschung (zmb)Abteilung Evolutionsbiologie und Ökologie, Albert-Ludwigs-Universität Freiburg, Institut für Biologie I (Zoologie)Abstract Background The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool. Results We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage ( https://www.zfmk.de/en/COGNATE ) and on github ( https://github.com/ZFMK/COGNATE ). Conclusion The tool COGNATE allows comparing genome assemblies and structural elements on multiples levels (e.g., scaffold or contig sequence, gene). It clearly enhances comparability between analyses. Thus, COGNATE can provide the important standardization of both genome and gene structure parameter disclosure as well as data acquisition for future comparative analyses. With the establishment of comprehensive descriptive standards and the extensive availability of genomes, an encompassing database will become possible.http://link.springer.com/article/10.1186/s12864-017-3870-8Comparative genomicsProtein-coding genesGene annotationGene repertoiresGene structureStandardization
spellingShingle Jeanne Wilbrandt
Bernhard Misof
Oliver Niehuis
COGNATE: comparative gene annotation characterizer
BMC Genomics
Comparative genomics
Protein-coding genes
Gene annotation
Gene repertoires
Gene structure
Standardization
title COGNATE: comparative gene annotation characterizer
title_full COGNATE: comparative gene annotation characterizer
title_fullStr COGNATE: comparative gene annotation characterizer
title_full_unstemmed COGNATE: comparative gene annotation characterizer
title_short COGNATE: comparative gene annotation characterizer
title_sort cognate comparative gene annotation characterizer
topic Comparative genomics
Protein-coding genes
Gene annotation
Gene repertoires
Gene structure
Standardization
url http://link.springer.com/article/10.1186/s12864-017-3870-8
work_keys_str_mv AT jeannewilbrandt cognatecomparativegeneannotationcharacterizer
AT bernhardmisof cognatecomparativegeneannotationcharacterizer
AT oliverniehuis cognatecomparativegeneannotationcharacterizer