Characterizing the Typical Information Curves of Diverse Languages

Optimal coding theories of language predict that speakers will keep the amount of information in their utterances relatively uniform under the constraints imposed by their language, but how much do these constraints influence information structure, and how does this influence vary across languages?...

Full description

Bibliographic Details
Main Authors: Josef Klafka, Daniel Yurovsky
Format: Article
Language:English
Published: MDPI AG 2021-10-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/23/10/1300
Description
Summary:Optimal coding theories of language predict that speakers will keep the amount of information in their utterances relatively uniform under the constraints imposed by their language, but how much do these constraints influence information structure, and how does this influence vary across languages? We present a novel method for characterizing the information structure of sentences across a diverse set of languages. While the structure of English is broadly consistent with the shape predicted by optimal coding, many languages are not consistent with this prediction. We proceed to show that the characteristic information curves of languages are partly related to a variety of typological features from phonology to word order. These results present an important step in the direction of exploring upper bounds for the extent to which linguistic codes can be optimal for communication.
ISSN:1099-4300