An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.

An accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and var...

Popoln opis

Bibliografske podrobnosti
Main Authors: Mott, R, Kirkwood, T, Curnow, R
Format: Journal article
Jezik:English
Izdano: 1990
_version_ 1826278186138206208
author Mott, R
Kirkwood, T
Curnow, R
author_facet Mott, R
Kirkwood, T
Curnow, R
author_sort Mott, R
collection OXFORD
description An accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and variance of longest common words. The application of the distribution to assessing the statistical significance of sequence similarities is considered. It is shown how the distribution can be modified to take account of non-independence of neighbouring bases in real sequences.
first_indexed 2024-03-06T23:40:11Z
format Journal article
id oxford-uuid:6f0943bf-8a3e-4a7e-9ab4-adc0b7881cff
institution University of Oxford
language English
last_indexed 2024-03-06T23:40:11Z
publishDate 1990
record_format dspace
spelling oxford-uuid:6f0943bf-8a3e-4a7e-9ab4-adc0b7881cff2022-03-26T19:28:13ZAn accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:6f0943bf-8a3e-4a7e-9ab4-adc0b7881cffEnglishSymplectic Elements at Oxford1990Mott, RKirkwood, TCurnow, RAn accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and variance of longest common words. The application of the distribution to assessing the statistical significance of sequence similarities is considered. It is shown how the distribution can be modified to take account of non-independence of neighbouring bases in real sequences.
spellingShingle Mott, R
Kirkwood, T
Curnow, R
An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title_full An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title_fullStr An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title_full_unstemmed An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title_short An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title_sort accurate approximation to the distribution of the length of the longest matching word between two random dna sequences
work_keys_str_mv AT mottr anaccurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences
AT kirkwoodt anaccurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences
AT curnowr anaccurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences
AT mottr accurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences
AT kirkwoodt accurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences
AT curnowr accurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences