Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes.

The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the...

Full description

Bibliographic Details
Main Authors: Zhihua Hua, Matthew J Early
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0209468
_version_ 1818444269334036480
author Zhihua Hua
Matthew J Early
author_facet Zhihua Hua
Matthew J Early
author_sort Zhihua Hua
collection DOAJ
description The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the duplication, organization, evolution, and function of superfamily genes are arguably important in many aspects of life. However, the incompleteness of annotations in many sequenced genomes often results in biased conclusions in comparative genomic studies of superfamilies. Here, we present a Perl software, called Closing Target Trimming (CTT), for automatically identifying most, if not all, members of a gene family in any sequenced genomes on CentOS 7 platform. To benefit a broader application on other operating systems, we also created a Docker application package, CTTdocker. Our test data on the F-box gene superfamily showed 78.2 and 79% gene finding accuracies in two well annotated plant genomes, Arabidopsis thaliana and rice, respectively. To further demonstrate the effectiveness of this program, we ran it through 18 plant genomes and five non-plant genomes to compare the expansion of the F-box and the BTB superfamilies. The program discovered that on average 12.7 and 9.3% of the total F-box and BTB members, respectively, are new loci in plant genomes, while it only found a small number of new members in vertebrate genomes. Therefore, different evolutionary and regulatory mechanisms of Cullin-RING ubiquitin ligases may be present in plants and animals. We also annotated and compared the Pkinase family members across a wide range of organisms, including 10 fungi, 10 metazoa, 10 vertebrates, and 10 additional plants, which were randomly selected from the Ensembl database. Our CTT annotation recovered on average 14% more loci, including pseudogenes, of the Pkinase superfamily in these 40 genomes, demonstrating its robust replicability and scalability in annotating superfamiy members in any genomes.
first_indexed 2024-12-14T19:13:15Z
format Article
id doaj.art-2fccf121ba96453ab6eac4ce811aa381
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-14T19:13:15Z
publishDate 2019-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-2fccf121ba96453ab6eac4ce811aa3812022-12-21T22:50:40ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-01147e020946810.1371/journal.pone.0209468Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes.Zhihua HuaMatthew J EarlyThe contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the duplication, organization, evolution, and function of superfamily genes are arguably important in many aspects of life. However, the incompleteness of annotations in many sequenced genomes often results in biased conclusions in comparative genomic studies of superfamilies. Here, we present a Perl software, called Closing Target Trimming (CTT), for automatically identifying most, if not all, members of a gene family in any sequenced genomes on CentOS 7 platform. To benefit a broader application on other operating systems, we also created a Docker application package, CTTdocker. Our test data on the F-box gene superfamily showed 78.2 and 79% gene finding accuracies in two well annotated plant genomes, Arabidopsis thaliana and rice, respectively. To further demonstrate the effectiveness of this program, we ran it through 18 plant genomes and five non-plant genomes to compare the expansion of the F-box and the BTB superfamilies. The program discovered that on average 12.7 and 9.3% of the total F-box and BTB members, respectively, are new loci in plant genomes, while it only found a small number of new members in vertebrate genomes. Therefore, different evolutionary and regulatory mechanisms of Cullin-RING ubiquitin ligases may be present in plants and animals. We also annotated and compared the Pkinase family members across a wide range of organisms, including 10 fungi, 10 metazoa, 10 vertebrates, and 10 additional plants, which were randomly selected from the Ensembl database. Our CTT annotation recovered on average 14% more loci, including pseudogenes, of the Pkinase superfamily in these 40 genomes, demonstrating its robust replicability and scalability in annotating superfamiy members in any genomes.https://doi.org/10.1371/journal.pone.0209468
spellingShingle Zhihua Hua
Matthew J Early
Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes.
PLoS ONE
title Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes.
title_full Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes.
title_fullStr Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes.
title_full_unstemmed Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes.
title_short Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes.
title_sort closing target trimming and cttdocker programs for discovering hidden superfamily loci in genomes
url https://doi.org/10.1371/journal.pone.0209468
work_keys_str_mv AT zhihuahua closingtargettrimmingandcttdockerprogramsfordiscoveringhiddensuperfamilylociingenomes
AT matthewjearly closingtargettrimmingandcttdockerprogramsfordiscoveringhiddensuperfamilylociingenomes