Asymptotic Analysis of the <i>k</i>th Subword Complexity
Patterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the <i>k</i&...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-02-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/22/2/207 |
_version_ | 1798034962811715584 |
---|---|
author | Lida Ahmadi Mark Daniel Ward |
author_facet | Lida Ahmadi Mark Daniel Ward |
author_sort | Lida Ahmadi |
collection | DOAJ |
description | Patterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the <i>k</i>th Subword Complexity of the character string. By definition, the <i>k</i>th Subword Complexity is the number of distinct substrings of length <i>k</i> that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the <i>k</i>th Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns’ auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of <inline-formula> <math display="inline"> <semantics> <mrow> <mi>k</mi> <mo>=</mo> <mi mathvariant="sans-serif">Θ</mi> <mo>(</mo> <mo form="prefix">log</mo> <mi>n</mi> <mo>)</mo> </mrow> </semantics> </math> </inline-formula>. In the proof, we compare the distribution of the <i>k</i>th Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis. |
first_indexed | 2024-04-11T20:51:38Z |
format | Article |
id | doaj.art-1701f267c59045328d6d1d66492bbd83 |
institution | Directory Open Access Journal |
issn | 1099-4300 |
language | English |
last_indexed | 2024-04-11T20:51:38Z |
publishDate | 2020-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Entropy |
spelling | doaj.art-1701f267c59045328d6d1d66492bbd832022-12-22T04:03:50ZengMDPI AGEntropy1099-43002020-02-0122220710.3390/e22020207e22020207Asymptotic Analysis of the <i>k</i>th Subword ComplexityLida Ahmadi0Mark Daniel Ward1Department of Mathematics, Purdue University, West Lafayette, IN 47907, USADepartment of Statistics, Purdue University, West Lafayette, IN 47907, USAPatterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the <i>k</i>th Subword Complexity of the character string. By definition, the <i>k</i>th Subword Complexity is the number of distinct substrings of length <i>k</i> that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the <i>k</i>th Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns’ auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of <inline-formula> <math display="inline"> <semantics> <mrow> <mi>k</mi> <mo>=</mo> <mi mathvariant="sans-serif">Θ</mi> <mo>(</mo> <mo form="prefix">log</mo> <mi>n</mi> <mo>)</mo> </mrow> </semantics> </math> </inline-formula>. In the proof, we compare the distribution of the <i>k</i>th Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis.https://www.mdpi.com/1099-4300/22/2/207subword complexityasymptoticsgenerating functionssaddle point methodprobabilitythe mellin transformmoments |
spellingShingle | Lida Ahmadi Mark Daniel Ward Asymptotic Analysis of the <i>k</i>th Subword Complexity Entropy subword complexity asymptotics generating functions saddle point method probability the mellin transform moments |
title | Asymptotic Analysis of the <i>k</i>th Subword Complexity |
title_full | Asymptotic Analysis of the <i>k</i>th Subword Complexity |
title_fullStr | Asymptotic Analysis of the <i>k</i>th Subword Complexity |
title_full_unstemmed | Asymptotic Analysis of the <i>k</i>th Subword Complexity |
title_short | Asymptotic Analysis of the <i>k</i>th Subword Complexity |
title_sort | asymptotic analysis of the i k i th subword complexity |
topic | subword complexity asymptotics generating functions saddle point method probability the mellin transform moments |
url | https://www.mdpi.com/1099-4300/22/2/207 |
work_keys_str_mv | AT lidaahmadi asymptoticanalysisoftheikithsubwordcomplexity AT markdanielward asymptoticanalysisoftheikithsubwordcomplexity |