A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes

Abstract Background Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pr...

Full description

Bibliographic Details
Main Authors: Yizhuang Zhou, Jifang Zheng, Yepeng Wu, Wenting Zhang, Junfei Jin
Format: Article
Language:English
Published: BMC 2020-02-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-020-6597-x
_version_ 1819158970412040192
author Yizhuang Zhou
Jifang Zheng
Yepeng Wu
Wenting Zhang
Junfei Jin
author_facet Yizhuang Zhou
Jifang Zheng
Yepeng Wu
Wenting Zhang
Junfei Jin
author_sort Yizhuang Zhou
collection DOAJ
description Abstract Background Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pre-selecting closely related genomes) followed by alignment and calculation has been proposed. Results Here, we initially test a published approach called “genome-wide tetranucleotide frequency correlation coefficient” (TETRA), which is specially tailored for sieving. Our results show that sieving by TETRA requires > 40% completeness for both genomes of a pair to yield > 95% sensitivity, indicating that TETRA is completeness-dependent. Accordingly, we develop a novel algorithm called “fragment tetranucleotide frequency correlation coefficient” (FRAGTE), which uses fragments rather than whole genomes for sieving. Our results show that FRAGTE achieves ~ 100% sensitivity and high specificity on simulated genomes, real genomes and metagenome-assembled genomes, demonstrating that FRAGTE is completeness-independent. Additionally, FRAGTE sieved a reduced number of total genomes for subsequent alignment and calculation to greatly improve computational efficiency for the process after sieving. Aside from this computational improvement, FRAGTE also reduces the computational cost for the sieving process. Consequently, FRAGTE extremely improves run efficiency for both the processes of sieving and after sieving (subsequent alignment and calculation) to together accelerate genome-wide species delineation. Conclusions FRAGTE is a completeness-independent algorithm for sieving. Due to its high sensitivity, high specificity, highly reduced number of sieved genomes and highly improved runtime, FRAGTE will be helpful for whole-genome approaches to facilitate taxonomic studies in prokaryotes.
first_indexed 2024-12-22T16:33:07Z
format Article
id doaj.art-3efa77e9ed0f4845810b1f46010f9e3e
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-22T16:33:07Z
publishDate 2020-02-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-3efa77e9ed0f4845810b1f46010f9e3e2022-12-21T18:20:01ZengBMCBMC Genomics1471-21642020-02-0121111610.1186/s12864-020-6597-xA completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotesYizhuang Zhou0Jifang Zheng1Yepeng Wu2Wenting Zhang3Junfei Jin4Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical UniversityGuangxi Key Laboratory of Tumor Immunology and Microenvironmental Regulation, Guilin Medical UniversityChina-USA Lipids in Health and Disease Research Center, Guilin Medical UniversityGuangxi Key Laboratory of Tumor Immunology and Microenvironmental Regulation, Guilin Medical UniversityLaboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical UniversityAbstract Background Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pre-selecting closely related genomes) followed by alignment and calculation has been proposed. Results Here, we initially test a published approach called “genome-wide tetranucleotide frequency correlation coefficient” (TETRA), which is specially tailored for sieving. Our results show that sieving by TETRA requires > 40% completeness for both genomes of a pair to yield > 95% sensitivity, indicating that TETRA is completeness-dependent. Accordingly, we develop a novel algorithm called “fragment tetranucleotide frequency correlation coefficient” (FRAGTE), which uses fragments rather than whole genomes for sieving. Our results show that FRAGTE achieves ~ 100% sensitivity and high specificity on simulated genomes, real genomes and metagenome-assembled genomes, demonstrating that FRAGTE is completeness-independent. Additionally, FRAGTE sieved a reduced number of total genomes for subsequent alignment and calculation to greatly improve computational efficiency for the process after sieving. Aside from this computational improvement, FRAGTE also reduces the computational cost for the sieving process. Consequently, FRAGTE extremely improves run efficiency for both the processes of sieving and after sieving (subsequent alignment and calculation) to together accelerate genome-wide species delineation. Conclusions FRAGTE is a completeness-independent algorithm for sieving. Due to its high sensitivity, high specificity, highly reduced number of sieved genomes and highly improved runtime, FRAGTE will be helpful for whole-genome approaches to facilitate taxonomic studies in prokaryotes.http://link.springer.com/article/10.1186/s12864-020-6597-xTetranucleotideCompositionTaxonomySpecies delineationFRAGTEMetagenomic binning
spellingShingle Yizhuang Zhou
Jifang Zheng
Yepeng Wu
Wenting Zhang
Junfei Jin
A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
BMC Genomics
Tetranucleotide
Composition
Taxonomy
Species delineation
FRAGTE
Metagenomic binning
title A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
title_full A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
title_fullStr A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
title_full_unstemmed A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
title_short A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
title_sort completeness independent method for pre selection of closely related genomes for species delineation in prokaryotes
topic Tetranucleotide
Composition
Taxonomy
Species delineation
FRAGTE
Metagenomic binning
url http://link.springer.com/article/10.1186/s12864-020-6597-x
work_keys_str_mv AT yizhuangzhou acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT jifangzheng acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT yepengwu acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT wentingzhang acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT junfeijin acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT yizhuangzhou completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT jifangzheng completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT yepengwu completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT wentingzhang completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT junfeijin completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes