Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture

The article discusses two mutually-incompatible hypotheses about the stochastic mechanism of the generation of texts in natural language, which could be related to entropy. The first hypothesis, the finite energy hypothesis, assumes that texts are generated by a process with exponentially-decaying p...

Full description

Bibliographic Details
Main Author: Łukasz Dębowski
Format: Article
Language:English
Published: MDPI AG 2015-08-01
Series:Entropy
Subjects:
Online Access:http://www.mdpi.com/1099-4300/17/8/5903
_version_ 1797999534555529216
author Łukasz Dębowski
author_facet Łukasz Dębowski
author_sort Łukasz Dębowski
collection DOAJ
description The article discusses two mutually-incompatible hypotheses about the stochastic mechanism of the generation of texts in natural language, which could be related to entropy. The first hypothesis, the finite energy hypothesis, assumes that texts are generated by a process with exponentially-decaying probabilities. This hypothesis implies a logarithmic upper bound for maximal repetition, as a function of the text length. The second hypothesis, the strong Hilberg conjecture, assumes that the topological entropy grows as a power law. This hypothesis leads to a hyperlogarithmic lower bound for maximal repetition. By a study of 35 written texts in German, English and French, it is found that the hyperlogarithmic growth of maximal repetition holds for natural language. In this way, the finite energy hypothesis is rejected, and the strong Hilberg conjecture is partly corroborated.
first_indexed 2024-04-11T11:06:12Z
format Article
id doaj.art-461769c1658844358f089dc1d2be07b2
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-04-11T11:06:12Z
publishDate 2015-08-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-461769c1658844358f089dc1d2be07b22022-12-22T04:28:21ZengMDPI AGEntropy1099-43002015-08-011785903591910.3390/e17085903e17085903Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg ConjectureŁukasz Dębowski0Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warszawa, PolandThe article discusses two mutually-incompatible hypotheses about the stochastic mechanism of the generation of texts in natural language, which could be related to entropy. The first hypothesis, the finite energy hypothesis, assumes that texts are generated by a process with exponentially-decaying probabilities. This hypothesis implies a logarithmic upper bound for maximal repetition, as a function of the text length. The second hypothesis, the strong Hilberg conjecture, assumes that the topological entropy grows as a power law. This hypothesis leads to a hyperlogarithmic lower bound for maximal repetition. By a study of 35 written texts in German, English and French, it is found that the hyperlogarithmic growth of maximal repetition holds for natural language. In this way, the finite energy hypothesis is rejected, and the strong Hilberg conjecture is partly corroborated.http://www.mdpi.com/1099-4300/17/8/5903finite energy processesHilberg’s conjectureentropy ratemaximal repetitionnatural language
spellingShingle Łukasz Dębowski
Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture
Entropy
finite energy processes
Hilberg’s conjecture
entropy rate
maximal repetition
natural language
title Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture
title_full Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture
title_fullStr Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture
title_full_unstemmed Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture
title_short Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture
title_sort maximal repetitions in written texts finite energy hypothesis vs strong hilberg conjecture
topic finite energy processes
Hilberg’s conjecture
entropy rate
maximal repetition
natural language
url http://www.mdpi.com/1099-4300/17/8/5903
work_keys_str_mv AT łukaszdebowski maximalrepetitionsinwrittentextsfiniteenergyhypothesisvsstronghilbergconjecture