A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
Abstract Background Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pr...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-02-01
|
Series: | BMC Genomics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12864-020-6597-x |
_version_ | 1819158970412040192 |
---|---|
author | Yizhuang Zhou Jifang Zheng Yepeng Wu Wenting Zhang Junfei Jin |
author_facet | Yizhuang Zhou Jifang Zheng Yepeng Wu Wenting Zhang Junfei Jin |
author_sort | Yizhuang Zhou |
collection | DOAJ |
description | Abstract Background Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pre-selecting closely related genomes) followed by alignment and calculation has been proposed. Results Here, we initially test a published approach called “genome-wide tetranucleotide frequency correlation coefficient” (TETRA), which is specially tailored for sieving. Our results show that sieving by TETRA requires > 40% completeness for both genomes of a pair to yield > 95% sensitivity, indicating that TETRA is completeness-dependent. Accordingly, we develop a novel algorithm called “fragment tetranucleotide frequency correlation coefficient” (FRAGTE), which uses fragments rather than whole genomes for sieving. Our results show that FRAGTE achieves ~ 100% sensitivity and high specificity on simulated genomes, real genomes and metagenome-assembled genomes, demonstrating that FRAGTE is completeness-independent. Additionally, FRAGTE sieved a reduced number of total genomes for subsequent alignment and calculation to greatly improve computational efficiency for the process after sieving. Aside from this computational improvement, FRAGTE also reduces the computational cost for the sieving process. Consequently, FRAGTE extremely improves run efficiency for both the processes of sieving and after sieving (subsequent alignment and calculation) to together accelerate genome-wide species delineation. Conclusions FRAGTE is a completeness-independent algorithm for sieving. Due to its high sensitivity, high specificity, highly reduced number of sieved genomes and highly improved runtime, FRAGTE will be helpful for whole-genome approaches to facilitate taxonomic studies in prokaryotes. |
first_indexed | 2024-12-22T16:33:07Z |
format | Article |
id | doaj.art-3efa77e9ed0f4845810b1f46010f9e3e |
institution | Directory Open Access Journal |
issn | 1471-2164 |
language | English |
last_indexed | 2024-12-22T16:33:07Z |
publishDate | 2020-02-01 |
publisher | BMC |
record_format | Article |
series | BMC Genomics |
spelling | doaj.art-3efa77e9ed0f4845810b1f46010f9e3e2022-12-21T18:20:01ZengBMCBMC Genomics1471-21642020-02-0121111610.1186/s12864-020-6597-xA completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotesYizhuang Zhou0Jifang Zheng1Yepeng Wu2Wenting Zhang3Junfei Jin4Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical UniversityGuangxi Key Laboratory of Tumor Immunology and Microenvironmental Regulation, Guilin Medical UniversityChina-USA Lipids in Health and Disease Research Center, Guilin Medical UniversityGuangxi Key Laboratory of Tumor Immunology and Microenvironmental Regulation, Guilin Medical UniversityLaboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical UniversityAbstract Background Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pre-selecting closely related genomes) followed by alignment and calculation has been proposed. Results Here, we initially test a published approach called “genome-wide tetranucleotide frequency correlation coefficient” (TETRA), which is specially tailored for sieving. Our results show that sieving by TETRA requires > 40% completeness for both genomes of a pair to yield > 95% sensitivity, indicating that TETRA is completeness-dependent. Accordingly, we develop a novel algorithm called “fragment tetranucleotide frequency correlation coefficient” (FRAGTE), which uses fragments rather than whole genomes for sieving. Our results show that FRAGTE achieves ~ 100% sensitivity and high specificity on simulated genomes, real genomes and metagenome-assembled genomes, demonstrating that FRAGTE is completeness-independent. Additionally, FRAGTE sieved a reduced number of total genomes for subsequent alignment and calculation to greatly improve computational efficiency for the process after sieving. Aside from this computational improvement, FRAGTE also reduces the computational cost for the sieving process. Consequently, FRAGTE extremely improves run efficiency for both the processes of sieving and after sieving (subsequent alignment and calculation) to together accelerate genome-wide species delineation. Conclusions FRAGTE is a completeness-independent algorithm for sieving. Due to its high sensitivity, high specificity, highly reduced number of sieved genomes and highly improved runtime, FRAGTE will be helpful for whole-genome approaches to facilitate taxonomic studies in prokaryotes.http://link.springer.com/article/10.1186/s12864-020-6597-xTetranucleotideCompositionTaxonomySpecies delineationFRAGTEMetagenomic binning |
spellingShingle | Yizhuang Zhou Jifang Zheng Yepeng Wu Wenting Zhang Junfei Jin A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes BMC Genomics Tetranucleotide Composition Taxonomy Species delineation FRAGTE Metagenomic binning |
title | A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes |
title_full | A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes |
title_fullStr | A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes |
title_full_unstemmed | A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes |
title_short | A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes |
title_sort | completeness independent method for pre selection of closely related genomes for species delineation in prokaryotes |
topic | Tetranucleotide Composition Taxonomy Species delineation FRAGTE Metagenomic binning |
url | http://link.springer.com/article/10.1186/s12864-020-6597-x |
work_keys_str_mv | AT yizhuangzhou acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes AT jifangzheng acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes AT yepengwu acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes AT wentingzhang acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes AT junfeijin acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes AT yizhuangzhou completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes AT jifangzheng completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes AT yepengwu completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes AT wentingzhang completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes AT junfeijin completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes |