Asymptotic Analysis of the <i>k</i>th Subword Complexity

Patterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the <i>k</i&...

Full description

Bibliographic Details
Main Authors:	Lida Ahmadi, Mark Daniel Ward
Format:	Article
Language:	English
Published:	MDPI AG 2020-02-01
Series:	Entropy
Subjects:	subword complexity asymptotics generating functions saddle point method probability the mellin transform moments
Online Access:	https://www.mdpi.com/1099-4300/22/2/207

_version_	1798034962811715584
author	Lida Ahmadi Mark Daniel Ward
author_facet	Lida Ahmadi Mark Daniel Ward
author_sort	Lida Ahmadi
collection	DOAJ
description	Patterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the <i>k</i>th Subword Complexity of the character string. By definition, the <i>k</i>th Subword Complexity is the number of distinct substrings of length <i>k</i> that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the <i>k</i>th Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns’ auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of <inline-formula> <math display="inline"> <semantics> <mrow> <mi>k</mi> <mo>=</mo> <mi mathvariant="sans-serif">Θ</mi> <mo>(</mo> <mo form="prefix">log</mo> <mi>n</mi> <mo>)</mo> </mrow> </semantics> </math> </inline-formula>. In the proof, we compare the distribution of the <i>k</i>th Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis.
first_indexed	2024-04-11T20:51:38Z
format	Article
id	doaj.art-1701f267c59045328d6d1d66492bbd83
institution	Directory Open Access Journal
issn	1099-4300
language	English
last_indexed	2024-04-11T20:51:38Z
publishDate	2020-02-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj.art-1701f267c59045328d6d1d66492bbd832022-12-22T04:03:50ZengMDPI AGEntropy1099-43002020-02-0122220710.3390/e22020207e22020207Asymptotic Analysis of the <i>k</i>th Subword ComplexityLida Ahmadi0Mark Daniel Ward1Department of Mathematics, Purdue University, West Lafayette, IN 47907, USADepartment of Statistics, Purdue University, West Lafayette, IN 47907, USAPatterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the <i>k</i>th Subword Complexity of the character string. By definition, the <i>k</i>th Subword Complexity is the number of distinct substrings of length <i>k</i> that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the <i>k</i>th Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns’ auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of <inline-formula> <math display="inline"> <semantics> <mrow> <mi>k</mi> <mo>=</mo> <mi mathvariant="sans-serif">Θ</mi> <mo>(</mo> <mo form="prefix">log</mo> <mi>n</mi> <mo>)</mo> </mrow> </semantics> </math> </inline-formula>. In the proof, we compare the distribution of the <i>k</i>th Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis.https://www.mdpi.com/1099-4300/22/2/207subword complexityasymptoticsgenerating functionssaddle point methodprobabilitythe mellin transformmoments
spellingShingle	Lida Ahmadi Mark Daniel Ward Asymptotic Analysis of the <i>k</i>th Subword Complexity Entropy subword complexity asymptotics generating functions saddle point method probability the mellin transform moments
title	Asymptotic Analysis of the <i>k</i>th Subword Complexity
title_full	Asymptotic Analysis of the <i>k</i>th Subword Complexity
title_fullStr	Asymptotic Analysis of the <i>k</i>th Subword Complexity
title_full_unstemmed	Asymptotic Analysis of the <i>k</i>th Subword Complexity
title_short	Asymptotic Analysis of the <i>k</i>th Subword Complexity
title_sort	asymptotic analysis of the i k i th subword complexity
topic	subword complexity asymptotics generating functions saddle point method probability the mellin transform moments
url	https://www.mdpi.com/1099-4300/22/2/207
work_keys_str_mv	AT lidaahmadi asymptoticanalysisoftheikithsubwordcomplexity AT markdanielward asymptoticanalysisoftheikithsubwordcomplexity

Asymptotic Analysis of the <i>k</i>th Subword Complexity

Similar Items