An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.

An accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and var...

Popoln opis

Bibliografske podrobnosti
Main Authors:	Mott, R, Kirkwood, T, Curnow, R
Format:	Journal article
Jezik:	English
Izdano:	1990

_version_	1826278186138206208
author	Mott, R Kirkwood, T Curnow, R
author_facet	Mott, R Kirkwood, T Curnow, R
author_sort	Mott, R
collection	OXFORD
description	An accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and variance of longest common words. The application of the distribution to assessing the statistical significance of sequence similarities is considered. It is shown how the distribution can be modified to take account of non-independence of neighbouring bases in real sequences.
first_indexed	2024-03-06T23:40:11Z
format	Journal article
id	oxford-uuid:6f0943bf-8a3e-4a7e-9ab4-adc0b7881cff
institution	University of Oxford
language	English
last_indexed	2024-03-06T23:40:11Z
publishDate	1990
record_format	dspace
spelling	oxford-uuid:6f0943bf-8a3e-4a7e-9ab4-adc0b7881cff2022-03-26T19:28:13ZAn accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:6f0943bf-8a3e-4a7e-9ab4-adc0b7881cffEnglishSymplectic Elements at Oxford1990Mott, RKirkwood, TCurnow, RAn accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and variance of longest common words. The application of the distribution to assessing the statistical significance of sequence similarities is considered. It is shown how the distribution can be modified to take account of non-independence of neighbouring bases in real sequences.
spellingShingle	Mott, R Kirkwood, T Curnow, R An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title	An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title_full	An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title_fullStr	An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title_full_unstemmed	An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title_short	An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.
title_sort	accurate approximation to the distribution of the length of the longest matching word between two random dna sequences
work_keys_str_mv	AT mottr anaccurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences AT kirkwoodt anaccurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences AT curnowr anaccurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences AT mottr accurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences AT kirkwoodt accurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences AT curnowr accurateapproximationtothedistributionofthelengthofthelongestmatchingwordbetweentworandomdnasequences

An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.

Podobne knjige/članki