Rare Events and Conditional Events on Random Strings

Some strings -the texts- are assumed to be randomly generated, according to a probability model that is either a Bernoulli model or a Markov model. A rare event is the over or under-representation of a word or a set of words. The aim of this paper is twofold. First, a single word is given. One studi...

Full description

Bibliographic Details
Main Authors: Mireille Régnier, Alain Denise
Format: Article
Language:English
Published: Discrete Mathematics & Theoretical Computer Science 2004-01-01
Series:Discrete Mathematics & Theoretical Computer Science
Subjects:
Online Access:https://dmtcs.episciences.org/310/pdf
_version_ 1797270186737270784
author Mireille Régnier
Alain Denise
author_facet Mireille Régnier
Alain Denise
author_sort Mireille Régnier
collection DOAJ
description Some strings -the texts- are assumed to be randomly generated, according to a probability model that is either a Bernoulli model or a Markov model. A rare event is the over or under-representation of a word or a set of words. The aim of this paper is twofold. First, a single word is given. One studies the tail distribution of the number of its occurrences. Sharp large deviation estimates are derived. Second, one assumes that a given word is overrepresented. The distribution of a second word is studied; formulae for the expectation and the variance are derived. In both cases, the formulae are accurate and actually computable. These results have applications in computational biology, where a genome is viewed as a text.
first_indexed 2024-04-25T02:00:16Z
format Article
id doaj.art-09a3d5b6f30a450db13af91c38e4f7d1
institution Directory Open Access Journal
issn 1365-8050
language English
last_indexed 2024-04-25T02:00:16Z
publishDate 2004-01-01
publisher Discrete Mathematics & Theoretical Computer Science
record_format Article
series Discrete Mathematics & Theoretical Computer Science
spelling doaj.art-09a3d5b6f30a450db13af91c38e4f7d12024-03-07T15:06:37ZengDiscrete Mathematics & Theoretical Computer ScienceDiscrete Mathematics & Theoretical Computer Science1365-80502004-01-01Vol. 6 no. 210.46298/dmtcs.310310Rare Events and Conditional Events on Random StringsMireille Régnier0Alain Denise1https://orcid.org/0000-0003-4484-4996AlgorithmsLaboratoire de Recherche en InformatiqueSome strings -the texts- are assumed to be randomly generated, according to a probability model that is either a Bernoulli model or a Markov model. A rare event is the over or under-representation of a word or a set of words. The aim of this paper is twofold. First, a single word is given. One studies the tail distribution of the number of its occurrences. Sharp large deviation estimates are derived. Second, one assumes that a given word is overrepresented. The distribution of a second word is studied; formulae for the expectation and the variance are derived. In both cases, the formulae are accurate and actually computable. These results have applications in computational biology, where a genome is viewed as a text.https://dmtcs.episciences.org/310/pdfcomputable closed formulaelarge deviationscombinatoricsgenerating fumctionswordsgenomecomputable closed formulae.[info.info-dm] computer science [cs]/discrete mathematics [cs.dm]
spellingShingle Mireille Régnier
Alain Denise
Rare Events and Conditional Events on Random Strings
Discrete Mathematics & Theoretical Computer Science
computable closed formulae
large deviations
combinatorics
generating fumctions
words
genome
computable closed formulae.
[info.info-dm] computer science [cs]/discrete mathematics [cs.dm]
title Rare Events and Conditional Events on Random Strings
title_full Rare Events and Conditional Events on Random Strings
title_fullStr Rare Events and Conditional Events on Random Strings
title_full_unstemmed Rare Events and Conditional Events on Random Strings
title_short Rare Events and Conditional Events on Random Strings
title_sort rare events and conditional events on random strings
topic computable closed formulae
large deviations
combinatorics
generating fumctions
words
genome
computable closed formulae.
[info.info-dm] computer science [cs]/discrete mathematics [cs.dm]
url https://dmtcs.episciences.org/310/pdf
work_keys_str_mv AT mireilleregnier rareeventsandconditionaleventsonrandomstrings
AT alaindenise rareeventsandconditionaleventsonrandomstrings