Menzerath’s Law in the Syntax of Languages Compared with Random Sentences

The Menzerath law is considered to show an aspect of the complexity underlying natural language. This law suggests that, for a linguistic unit, the size (<i>y</i>) of a linguistic construct decreases as the number (<i>x</i>) of constructs in the unit increases. This article i...

Full description

Bibliographic Details
Main Author:	Kumiko Tanaka-Ishii
Format:	Article
Language:	English
Published:	MDPI AG 2021-05-01
Series:	Entropy
Subjects:	Menzerath law complexity natural language syntax
Online Access:	https://www.mdpi.com/1099-4300/23/6/661

_version_	1797532741881823232
author	Kumiko Tanaka-Ishii
author_facet	Kumiko Tanaka-Ishii
author_sort	Kumiko Tanaka-Ishii
collection	DOAJ
description	The Menzerath law is considered to show an aspect of the complexity underlying natural language. This law suggests that, for a linguistic unit, the size (<i>y</i>) of a linguistic construct decreases as the number (<i>x</i>) of constructs in the unit increases. This article investigates this property syntactically, with <i>x</i> as the number of constituents modifying the main predicate of a sentence and <i>y</i> as the size of those constituents in terms of the number of words. Following previous articles that demonstrated that the Menzerath property held for dependency corpora, such as in Czech and Ukrainian, this article first examines how well the property applies across languages by using the entire Universal Dependency dataset ver. 2.3, including 76 languages over 129 corpora and the Penn Treebank (PTB). The results show that the law holds reasonably well for <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>x</mi><mo>></mo><mn>2</mn></mrow></semantics></math></inline-formula>. Then, for comparison, the property is investigated with syntactically randomized sentences generated from the PTB. These results show that the property is almost reproducible even from simple random data. Further analysis of the property highlights more detailed characteristics of natural language.
first_indexed	2024-03-10T11:04:35Z
format	Article
id	doaj.art-96d71ebbcd02457eb200f24d5d07fb17
institution	Directory Open Access Journal
issn	1099-4300
language	English
last_indexed	2024-03-10T11:04:35Z
publishDate	2021-05-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj.art-96d71ebbcd02457eb200f24d5d07fb172023-11-21T21:15:32ZengMDPI AGEntropy1099-43002021-05-0123666110.3390/e23060661Menzerath’s Law in the Syntax of Languages Compared with Random SentencesKumiko Tanaka-Ishii0Research Center for Advanced Technology, The University of Tokyo, Tokyo 153-8904, JapanThe Menzerath law is considered to show an aspect of the complexity underlying natural language. This law suggests that, for a linguistic unit, the size (<i>y</i>) of a linguistic construct decreases as the number (<i>x</i>) of constructs in the unit increases. This article investigates this property syntactically, with <i>x</i> as the number of constituents modifying the main predicate of a sentence and <i>y</i> as the size of those constituents in terms of the number of words. Following previous articles that demonstrated that the Menzerath property held for dependency corpora, such as in Czech and Ukrainian, this article first examines how well the property applies across languages by using the entire Universal Dependency dataset ver. 2.3, including 76 languages over 129 corpora and the Penn Treebank (PTB). The results show that the law holds reasonably well for <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>x</mi><mo>></mo><mn>2</mn></mrow></semantics></math></inline-formula>. Then, for comparison, the property is investigated with syntactically randomized sentences generated from the PTB. These results show that the property is almost reproducible even from simple random data. Further analysis of the property highlights more detailed characteristics of natural language.https://www.mdpi.com/1099-4300/23/6/661Menzerath lawcomplexitynatural languagesyntax
spellingShingle	Kumiko Tanaka-Ishii Menzerath’s Law in the Syntax of Languages Compared with Random Sentences Entropy Menzerath law complexity natural language syntax
title	Menzerath’s Law in the Syntax of Languages Compared with Random Sentences
title_full	Menzerath’s Law in the Syntax of Languages Compared with Random Sentences
title_fullStr	Menzerath’s Law in the Syntax of Languages Compared with Random Sentences
title_full_unstemmed	Menzerath’s Law in the Syntax of Languages Compared with Random Sentences
title_short	Menzerath’s Law in the Syntax of Languages Compared with Random Sentences
title_sort	menzerath s law in the syntax of languages compared with random sentences
topic	Menzerath law complexity natural language syntax
url	https://www.mdpi.com/1099-4300/23/6/661
work_keys_str_mv	AT kumikotanakaishii menzerathslawinthesyntaxoflanguagescomparedwithrandomsentences

Menzerath’s Law in the Syntax of Languages Compared with Random Sentences

Similar Items