An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction

Abstract Background Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counter...

Full description

Bibliographic Details
Main Authors: Sriram P. Chockalingam, Jodh Pannu, Sahar Hooshmand, Sharma V. Thankachan, Srinivas Aluru
Format: Article
Language:English
Published: BMC 2020-11-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-03738-5
_version_ 1828542821218385920
author Sriram P. Chockalingam
Jodh Pannu
Sahar Hooshmand
Sharma V. Thankachan
Srinivas Aluru
author_facet Sriram P. Chockalingam
Jodh Pannu
Sahar Hooshmand
Sharma V. Thankachan
Srinivas Aluru
author_sort Sriram P. Chockalingam
collection DOAJ
description Abstract Background Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACS k , have been shown to produce results as effective as multiple-sequence alignment based methods for reconstruction of phylogeny trees. Since computing ACS k takes O(n logk n) time and hence impractical for large datasets, multiple heuristics that can approximate ACS k have been introduced. Results In this paper, we present a novel linear-time heuristic to approximate ACS k , which is faster than computing the exact ACS k while being closer to the exact ACS k values compared to previously published linear-time greedy heuristics. Using four real datasets, containing both DNA and protein sequences, we evaluate our algorithm in terms of accuracy, runtime and demonstrate its applicability for phylogeny reconstruction. Our algorithm provides better accuracy than previously published heuristic methods, while being comparable in its applications to phylogeny reconstruction. Conclusions Our method produces a better approximation for ACS k and is applicable for the alignment-free comparison of biological sequences at highly competitive speed. The algorithm is implemented in Rust programming language and the source code is available at https://github.com/srirampc/adyar-rs .
first_indexed 2024-12-12T02:04:09Z
format Article
id doaj.art-96db9fcf7ab54535b9a6b3d9e2ce192f
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-12T02:04:09Z
publishDate 2020-11-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-96db9fcf7ab54535b9a6b3d9e2ce192f2022-12-22T00:42:06ZengBMCBMC Bioinformatics1471-21052020-11-0121S611210.1186/s12859-020-03738-5An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstructionSriram P. Chockalingam0Jodh Pannu1Sahar Hooshmand2Sharma V. Thankachan3Srinivas Aluru4Institute for Data Engineering and Science, Georiga Institute of TechnologyDepartment of Computer Science, University of Central FloridaDepartment of Computer Science, University of Central FloridaDepartment of Computer Science, University of Central FloridaInstitute for Data Engineering and Science, Georiga Institute of TechnologyAbstract Background Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACS k , have been shown to produce results as effective as multiple-sequence alignment based methods for reconstruction of phylogeny trees. Since computing ACS k takes O(n logk n) time and hence impractical for large datasets, multiple heuristics that can approximate ACS k have been introduced. Results In this paper, we present a novel linear-time heuristic to approximate ACS k , which is faster than computing the exact ACS k while being closer to the exact ACS k values compared to previously published linear-time greedy heuristics. Using four real datasets, containing both DNA and protein sequences, we evaluate our algorithm in terms of accuracy, runtime and demonstrate its applicability for phylogeny reconstruction. Our algorithm provides better accuracy than previously published heuristic methods, while being comparable in its applications to phylogeny reconstruction. Conclusions Our method produces a better approximation for ACS k and is applicable for the alignment-free comparison of biological sequences at highly competitive speed. The algorithm is implemented in Rust programming language and the source code is available at https://github.com/srirampc/adyar-rs .http://link.springer.com/article/10.1186/s12859-020-03738-5Alignment-free methodsSequence comparisonPhylogeny reconstruction
spellingShingle Sriram P. Chockalingam
Jodh Pannu
Sahar Hooshmand
Sharma V. Thankachan
Srinivas Aluru
An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction
BMC Bioinformatics
Alignment-free methods
Sequence comparison
Phylogeny reconstruction
title An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction
title_full An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction
title_fullStr An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction
title_full_unstemmed An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction
title_short An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction
title_sort alignment free heuristic for fast sequence comparisons with applications to phylogeny reconstruction
topic Alignment-free methods
Sequence comparison
Phylogeny reconstruction
url http://link.springer.com/article/10.1186/s12859-020-03738-5
work_keys_str_mv AT srirampchockalingam analignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction
AT jodhpannu analignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction
AT saharhooshmand analignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction
AT sharmavthankachan analignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction
AT srinivasaluru analignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction
AT srirampchockalingam alignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction
AT jodhpannu alignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction
AT saharhooshmand alignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction
AT sharmavthankachan alignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction
AT srinivasaluru alignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction