Fast algorithms for computing sequence distances by exhaustive substring composition

<p>Abstract</p> <p>The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequ...

Full description

Bibliographic Details
Main Authors: Apostolico Alberto, Denas Olgert
Format: Article
Language:English
Published: BMC 2008-10-01
Series:Algorithms for Molecular Biology
Online Access:http://www.almob.org/content/3/1/13
_version_ 1811260851424854016
author Apostolico Alberto
Denas Olgert
author_facet Apostolico Alberto
Denas Olgert
author_sort Apostolico Alberto
collection DOAJ
description <p>Abstract</p> <p>The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like string edit and gene rearrangement, due to a mixture of epistemological and computational problems. Alternative measures, based on the subword composition of sequences, have emerged in recent years and proved to be both fast and effective in a variety of tested cases. The common denominator of such measures is an underlying information theoretic notion of relative compressibility. Their viability depends critically on computational cost. The present paper describes as a paradigm the extension and efficient implementation of one of the methods in this class. The method is based on the comparison of the frequencies of all subwords in the two input sequences, where frequencies are suitably adjusted to take into account the statistical background.</p>
first_indexed 2024-04-12T18:53:52Z
format Article
id doaj.art-4ae260ea5e4a4b2a9514304ca1644b1e
institution Directory Open Access Journal
issn 1748-7188
language English
last_indexed 2024-04-12T18:53:52Z
publishDate 2008-10-01
publisher BMC
record_format Article
series Algorithms for Molecular Biology
spelling doaj.art-4ae260ea5e4a4b2a9514304ca1644b1e2022-12-22T03:20:24ZengBMCAlgorithms for Molecular Biology1748-71882008-10-01311310.1186/1748-7188-3-13Fast algorithms for computing sequence distances by exhaustive substring compositionApostolico AlbertoDenas Olgert<p>Abstract</p> <p>The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like string edit and gene rearrangement, due to a mixture of epistemological and computational problems. Alternative measures, based on the subword composition of sequences, have emerged in recent years and proved to be both fast and effective in a variety of tested cases. The common denominator of such measures is an underlying information theoretic notion of relative compressibility. Their viability depends critically on computational cost. The present paper describes as a paradigm the extension and efficient implementation of one of the methods in this class. The method is based on the comparison of the frequencies of all subwords in the two input sequences, where frequencies are suitably adjusted to take into account the statistical background.</p>http://www.almob.org/content/3/1/13
spellingShingle Apostolico Alberto
Denas Olgert
Fast algorithms for computing sequence distances by exhaustive substring composition
Algorithms for Molecular Biology
title Fast algorithms for computing sequence distances by exhaustive substring composition
title_full Fast algorithms for computing sequence distances by exhaustive substring composition
title_fullStr Fast algorithms for computing sequence distances by exhaustive substring composition
title_full_unstemmed Fast algorithms for computing sequence distances by exhaustive substring composition
title_short Fast algorithms for computing sequence distances by exhaustive substring composition
title_sort fast algorithms for computing sequence distances by exhaustive substring composition
url http://www.almob.org/content/3/1/13
work_keys_str_mv AT apostolicoalberto fastalgorithmsforcomputingsequencedistancesbyexhaustivesubstringcomposition
AT denasolgert fastalgorithmsforcomputingsequencedistancesbyexhaustivesubstringcomposition