Rare Events and Conditional Events on Random Strings
Some strings -the texts- are assumed to be randomly generated, according to a probability model that is either a Bernoulli model or a Markov model. A rare event is the over or under-representation of a word or a set of words. The aim of this paper is twofold. First, a single word is given. One studi...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Discrete Mathematics & Theoretical Computer Science
2004-01-01
|
Series: | Discrete Mathematics & Theoretical Computer Science |
Subjects: | |
Online Access: | https://dmtcs.episciences.org/310/pdf |
_version_ | 1797270186737270784 |
---|---|
author | Mireille Régnier Alain Denise |
author_facet | Mireille Régnier Alain Denise |
author_sort | Mireille Régnier |
collection | DOAJ |
description | Some strings -the texts- are assumed to be randomly generated, according to a probability model that is either a Bernoulli model or a Markov model. A rare event is the over or under-representation of a word or a set of words. The aim of this paper is twofold. First, a single word is given. One studies the tail distribution of the number of its occurrences. Sharp large deviation estimates are derived. Second, one assumes that a given word is overrepresented. The distribution of a second word is studied; formulae for the expectation and the variance are derived. In both cases, the formulae are accurate and actually computable. These results have applications in computational biology, where a genome is viewed as a text. |
first_indexed | 2024-04-25T02:00:16Z |
format | Article |
id | doaj.art-09a3d5b6f30a450db13af91c38e4f7d1 |
institution | Directory Open Access Journal |
issn | 1365-8050 |
language | English |
last_indexed | 2024-04-25T02:00:16Z |
publishDate | 2004-01-01 |
publisher | Discrete Mathematics & Theoretical Computer Science |
record_format | Article |
series | Discrete Mathematics & Theoretical Computer Science |
spelling | doaj.art-09a3d5b6f30a450db13af91c38e4f7d12024-03-07T15:06:37ZengDiscrete Mathematics & Theoretical Computer ScienceDiscrete Mathematics & Theoretical Computer Science1365-80502004-01-01Vol. 6 no. 210.46298/dmtcs.310310Rare Events and Conditional Events on Random StringsMireille Régnier0Alain Denise1https://orcid.org/0000-0003-4484-4996AlgorithmsLaboratoire de Recherche en InformatiqueSome strings -the texts- are assumed to be randomly generated, according to a probability model that is either a Bernoulli model or a Markov model. A rare event is the over or under-representation of a word or a set of words. The aim of this paper is twofold. First, a single word is given. One studies the tail distribution of the number of its occurrences. Sharp large deviation estimates are derived. Second, one assumes that a given word is overrepresented. The distribution of a second word is studied; formulae for the expectation and the variance are derived. In both cases, the formulae are accurate and actually computable. These results have applications in computational biology, where a genome is viewed as a text.https://dmtcs.episciences.org/310/pdfcomputable closed formulaelarge deviationscombinatoricsgenerating fumctionswordsgenomecomputable closed formulae.[info.info-dm] computer science [cs]/discrete mathematics [cs.dm] |
spellingShingle | Mireille Régnier Alain Denise Rare Events and Conditional Events on Random Strings Discrete Mathematics & Theoretical Computer Science computable closed formulae large deviations combinatorics generating fumctions words genome computable closed formulae. [info.info-dm] computer science [cs]/discrete mathematics [cs.dm] |
title | Rare Events and Conditional Events on Random Strings |
title_full | Rare Events and Conditional Events on Random Strings |
title_fullStr | Rare Events and Conditional Events on Random Strings |
title_full_unstemmed | Rare Events and Conditional Events on Random Strings |
title_short | Rare Events and Conditional Events on Random Strings |
title_sort | rare events and conditional events on random strings |
topic | computable closed formulae large deviations combinatorics generating fumctions words genome computable closed formulae. [info.info-dm] computer science [cs]/discrete mathematics [cs.dm] |
url | https://dmtcs.episciences.org/310/pdf |
work_keys_str_mv | AT mireilleregnier rareeventsandconditionaleventsonrandomstrings AT alaindenise rareeventsandconditionaleventsonrandomstrings |