Asymptotic Analysis of the <i>k</i>th Subword Complexity

Patterns within strings enable us to extract vital information regarding a string&#8217;s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the <i>k</i&...

Full description

Bibliographic Details
Main Authors: Lida Ahmadi, Mark Daniel Ward
Format: Article
Language:English
Published: MDPI AG 2020-02-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/22/2/207
_version_ 1798034962811715584
author Lida Ahmadi
Mark Daniel Ward
author_facet Lida Ahmadi
Mark Daniel Ward
author_sort Lida Ahmadi
collection DOAJ
description Patterns within strings enable us to extract vital information regarding a string&#8217;s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the <i>k</i>th Subword Complexity of the character string. By definition, the <i>k</i>th Subword Complexity is the number of distinct substrings of length <i>k</i> that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the <i>k</i>th Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns&#8217; auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of <inline-formula> <math display="inline"> <semantics> <mrow> <mi>k</mi> <mo>=</mo> <mi mathvariant="sans-serif">&#920;</mi> <mo>(</mo> <mo form="prefix">log</mo> <mi>n</mi> <mo>)</mo> </mrow> </semantics> </math> </inline-formula>. In the proof, we compare the distribution of the <i>k</i>th Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis.
first_indexed 2024-04-11T20:51:38Z
format Article
id doaj.art-1701f267c59045328d6d1d66492bbd83
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-04-11T20:51:38Z
publishDate 2020-02-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-1701f267c59045328d6d1d66492bbd832022-12-22T04:03:50ZengMDPI AGEntropy1099-43002020-02-0122220710.3390/e22020207e22020207Asymptotic Analysis of the <i>k</i>th Subword ComplexityLida Ahmadi0Mark Daniel Ward1Department of Mathematics, Purdue University, West Lafayette, IN 47907, USADepartment of Statistics, Purdue University, West Lafayette, IN 47907, USAPatterns within strings enable us to extract vital information regarding a string&#8217;s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the <i>k</i>th Subword Complexity of the character string. By definition, the <i>k</i>th Subword Complexity is the number of distinct substrings of length <i>k</i> that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the <i>k</i>th Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns&#8217; auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of <inline-formula> <math display="inline"> <semantics> <mrow> <mi>k</mi> <mo>=</mo> <mi mathvariant="sans-serif">&#920;</mi> <mo>(</mo> <mo form="prefix">log</mo> <mi>n</mi> <mo>)</mo> </mrow> </semantics> </math> </inline-formula>. In the proof, we compare the distribution of the <i>k</i>th Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis.https://www.mdpi.com/1099-4300/22/2/207subword complexityasymptoticsgenerating functionssaddle point methodprobabilitythe mellin transformmoments
spellingShingle Lida Ahmadi
Mark Daniel Ward
Asymptotic Analysis of the <i>k</i>th Subword Complexity
Entropy
subword complexity
asymptotics
generating functions
saddle point method
probability
the mellin transform
moments
title Asymptotic Analysis of the <i>k</i>th Subword Complexity
title_full Asymptotic Analysis of the <i>k</i>th Subword Complexity
title_fullStr Asymptotic Analysis of the <i>k</i>th Subword Complexity
title_full_unstemmed Asymptotic Analysis of the <i>k</i>th Subword Complexity
title_short Asymptotic Analysis of the <i>k</i>th Subword Complexity
title_sort asymptotic analysis of the i k i th subword complexity
topic subword complexity
asymptotics
generating functions
saddle point method
probability
the mellin transform
moments
url https://www.mdpi.com/1099-4300/22/2/207
work_keys_str_mv AT lidaahmadi asymptoticanalysisoftheikithsubwordcomplexity
AT markdanielward asymptoticanalysisoftheikithsubwordcomplexity