An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
An accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and var...
Main Authors: | , , |
---|---|
Format: | Journal article |
Jezik: | English |
Izdano: |
1990
|
_version_ | 1826278186138206208 |
---|---|
author | Mott, R Kirkwood, T Curnow, R |
author_facet | Mott, R Kirkwood, T Curnow, R |
author_sort | Mott, R |
collection | OXFORD |
description | An accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and variance of longest common words. The application of the distribution to assessing the statistical significance of sequence similarities is considered. It is shown how the distribution can be modified to take account of non-independence of neighbouring bases in real sequences. |
first_indexed | 2024-03-06T23:40:11Z |
format | Journal article |
id | oxford-uuid:6f0943bf-8a3e-4a7e-9ab4-adc0b7881cff |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-06T23:40:11Z |
publishDate | 1990 |
record_format | dspace |
spelling | oxford-uuid:6f0943bf-8a3e-4a7e-9ab4-adc0b7881cff2022-03-26T19:28:13ZAn accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:6f0943bf-8a3e-4a7e-9ab4-adc0b7881cffEnglishSymplectic Elements at Oxford1990Mott, RKirkwood, TCurnow, RAn accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and variance of longest common words. The application of the distribution to assessing the statistical significance of sequence similarities is considered. It is shown how the distribution can be modified to take account of non-independence of neighbouring bases in real sequences. |
spellingShingle | Mott, R Kirkwood, T Curnow, R An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences. |
title | An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences. |
title_full | An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences. |
title_fullStr | An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences. |
title_full_unstemmed | An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences. |
title_short | An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences. |
title_sort | accurate approximation to the distribution of the length of the longest matching word between two random dna sequences |
work_keys_str_mv | AT mottr anaccurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences AT kirkwoodt anaccurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences AT curnowr anaccurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences AT mottr accurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences AT kirkwoodt accurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences AT curnowr accurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences |