An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction
Abstract Background Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counter...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-11-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-020-03738-5 |
_version_ | 1828542821218385920 |
---|---|
author | Sriram P. Chockalingam Jodh Pannu Sahar Hooshmand Sharma V. Thankachan Srinivas Aluru |
author_facet | Sriram P. Chockalingam Jodh Pannu Sahar Hooshmand Sharma V. Thankachan Srinivas Aluru |
author_sort | Sriram P. Chockalingam |
collection | DOAJ |
description | Abstract Background Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACS k , have been shown to produce results as effective as multiple-sequence alignment based methods for reconstruction of phylogeny trees. Since computing ACS k takes O(n logk n) time and hence impractical for large datasets, multiple heuristics that can approximate ACS k have been introduced. Results In this paper, we present a novel linear-time heuristic to approximate ACS k , which is faster than computing the exact ACS k while being closer to the exact ACS k values compared to previously published linear-time greedy heuristics. Using four real datasets, containing both DNA and protein sequences, we evaluate our algorithm in terms of accuracy, runtime and demonstrate its applicability for phylogeny reconstruction. Our algorithm provides better accuracy than previously published heuristic methods, while being comparable in its applications to phylogeny reconstruction. Conclusions Our method produces a better approximation for ACS k and is applicable for the alignment-free comparison of biological sequences at highly competitive speed. The algorithm is implemented in Rust programming language and the source code is available at https://github.com/srirampc/adyar-rs . |
first_indexed | 2024-12-12T02:04:09Z |
format | Article |
id | doaj.art-96db9fcf7ab54535b9a6b3d9e2ce192f |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-12T02:04:09Z |
publishDate | 2020-11-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-96db9fcf7ab54535b9a6b3d9e2ce192f2022-12-22T00:42:06ZengBMCBMC Bioinformatics1471-21052020-11-0121S611210.1186/s12859-020-03738-5An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstructionSriram P. Chockalingam0Jodh Pannu1Sahar Hooshmand2Sharma V. Thankachan3Srinivas Aluru4Institute for Data Engineering and Science, Georiga Institute of TechnologyDepartment of Computer Science, University of Central FloridaDepartment of Computer Science, University of Central FloridaDepartment of Computer Science, University of Central FloridaInstitute for Data Engineering and Science, Georiga Institute of TechnologyAbstract Background Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACS k , have been shown to produce results as effective as multiple-sequence alignment based methods for reconstruction of phylogeny trees. Since computing ACS k takes O(n logk n) time and hence impractical for large datasets, multiple heuristics that can approximate ACS k have been introduced. Results In this paper, we present a novel linear-time heuristic to approximate ACS k , which is faster than computing the exact ACS k while being closer to the exact ACS k values compared to previously published linear-time greedy heuristics. Using four real datasets, containing both DNA and protein sequences, we evaluate our algorithm in terms of accuracy, runtime and demonstrate its applicability for phylogeny reconstruction. Our algorithm provides better accuracy than previously published heuristic methods, while being comparable in its applications to phylogeny reconstruction. Conclusions Our method produces a better approximation for ACS k and is applicable for the alignment-free comparison of biological sequences at highly competitive speed. The algorithm is implemented in Rust programming language and the source code is available at https://github.com/srirampc/adyar-rs .http://link.springer.com/article/10.1186/s12859-020-03738-5Alignment-free methodsSequence comparisonPhylogeny reconstruction |
spellingShingle | Sriram P. Chockalingam Jodh Pannu Sahar Hooshmand Sharma V. Thankachan Srinivas Aluru An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction BMC Bioinformatics Alignment-free methods Sequence comparison Phylogeny reconstruction |
title | An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction |
title_full | An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction |
title_fullStr | An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction |
title_full_unstemmed | An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction |
title_short | An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction |
title_sort | alignment free heuristic for fast sequence comparisons with applications to phylogeny reconstruction |
topic | Alignment-free methods Sequence comparison Phylogeny reconstruction |
url | http://link.springer.com/article/10.1186/s12859-020-03738-5 |
work_keys_str_mv | AT srirampchockalingam analignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction AT jodhpannu analignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction AT saharhooshmand analignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction AT sharmavthankachan analignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction AT srinivasaluru analignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction AT srirampchockalingam alignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction AT jodhpannu alignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction AT saharhooshmand alignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction AT sharmavthankachan alignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction AT srinivasaluru alignmentfreeheuristicforfastsequencecomparisonswithapplicationstophylogenyreconstruction |