Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture
The article discusses two mutually-incompatible hypotheses about the stochastic mechanism of the generation of texts in natural language, which could be related to entropy. The first hypothesis, the finite energy hypothesis, assumes that texts are generated by a process with exponentially-decaying p...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2015-08-01
|
Series: | Entropy |
Subjects: | |
Online Access: | http://www.mdpi.com/1099-4300/17/8/5903 |
_version_ | 1797999534555529216 |
---|---|
author | Łukasz Dębowski |
author_facet | Łukasz Dębowski |
author_sort | Łukasz Dębowski |
collection | DOAJ |
description | The article discusses two mutually-incompatible hypotheses about the stochastic mechanism of the generation of texts in natural language, which could be related to entropy. The first hypothesis, the finite energy hypothesis, assumes that texts are generated by a process with exponentially-decaying probabilities. This hypothesis implies a logarithmic upper bound for maximal repetition, as a function of the text length. The second hypothesis, the strong Hilberg conjecture, assumes that the topological entropy grows as a power law. This hypothesis leads to a hyperlogarithmic lower bound for maximal repetition. By a study of 35 written texts in German, English and French, it is found that the hyperlogarithmic growth of maximal repetition holds for natural language. In this way, the finite energy hypothesis is rejected, and the strong Hilberg conjecture is partly corroborated. |
first_indexed | 2024-04-11T11:06:12Z |
format | Article |
id | doaj.art-461769c1658844358f089dc1d2be07b2 |
institution | Directory Open Access Journal |
issn | 1099-4300 |
language | English |
last_indexed | 2024-04-11T11:06:12Z |
publishDate | 2015-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Entropy |
spelling | doaj.art-461769c1658844358f089dc1d2be07b22022-12-22T04:28:21ZengMDPI AGEntropy1099-43002015-08-011785903591910.3390/e17085903e17085903Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg ConjectureŁukasz Dębowski0Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warszawa, PolandThe article discusses two mutually-incompatible hypotheses about the stochastic mechanism of the generation of texts in natural language, which could be related to entropy. The first hypothesis, the finite energy hypothesis, assumes that texts are generated by a process with exponentially-decaying probabilities. This hypothesis implies a logarithmic upper bound for maximal repetition, as a function of the text length. The second hypothesis, the strong Hilberg conjecture, assumes that the topological entropy grows as a power law. This hypothesis leads to a hyperlogarithmic lower bound for maximal repetition. By a study of 35 written texts in German, English and French, it is found that the hyperlogarithmic growth of maximal repetition holds for natural language. In this way, the finite energy hypothesis is rejected, and the strong Hilberg conjecture is partly corroborated.http://www.mdpi.com/1099-4300/17/8/5903finite energy processesHilberg’s conjectureentropy ratemaximal repetitionnatural language |
spellingShingle | Łukasz Dębowski Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture Entropy finite energy processes Hilberg’s conjecture entropy rate maximal repetition natural language |
title | Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture |
title_full | Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture |
title_fullStr | Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture |
title_full_unstemmed | Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture |
title_short | Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture |
title_sort | maximal repetitions in written texts finite energy hypothesis vs strong hilberg conjecture |
topic | finite energy processes Hilberg’s conjecture entropy rate maximal repetition natural language |
url | http://www.mdpi.com/1099-4300/17/8/5903 |
work_keys_str_mv | AT łukaszdebowski maximalrepetitionsinwrittentextsfiniteenergyhypothesisvsstronghilbergconjecture |