How to Distinguish Languages and Dialects

The terms “language” and “dialect” are ingrained, but linguists nevertheless tend to agree that it is impossible to apply a non-arbitrary distinction such that two speech varieties can be identified as either distinct languages or two dialects of one and the same language. A database of lexical info...

Full description

Bibliographic Details
Main Author: Wichmann, Søren
Format: Article
Language:English
Published: The MIT Press 2020-01-01
Series:Computational Linguistics
Online Access:https://www.mitpressjournals.org/doi/abs/10.1162/coli_a_00366
_version_ 1828464268940410880
author Wichmann, Søren
author_facet Wichmann, Søren
author_sort Wichmann, Søren
collection DOAJ
description The terms “language” and “dialect” are ingrained, but linguists nevertheless tend to agree that it is impossible to apply a non-arbitrary distinction such that two speech varieties can be identified as either distinct languages or two dialects of one and the same language. A database of lexical information for more than 7,500 speech varieties, however, unveils a strong tendency for linguistic distances to be bimodally distributed. For a given language group the linguistic distances pertaining to either cluster can be teased apart, identifying a mixture of normal distributions within the data and then separating them fitting curves and finding the point where they cross. The thresholds identified are remarkably consistent across data sets, qualifying their mean as a universal criterion for distinguishing between language and dialect pairs. The mean of the thresholds identified translates into a temporal distance of around one to one-and-a-half millennia (1,075–1,635 years).
first_indexed 2024-12-11T03:10:44Z
format Article
id doaj.art-24adacb0e85440a29376d37dd8c6eb21
institution Directory Open Access Journal
issn 0891-2017
1530-9312
language English
last_indexed 2024-12-11T03:10:44Z
publishDate 2020-01-01
publisher The MIT Press
record_format Article
series Computational Linguistics
spelling doaj.art-24adacb0e85440a29376d37dd8c6eb212022-12-22T01:22:52ZengThe MIT PressComputational Linguistics0891-20171530-93122020-01-0145482383110.1162/coli_a_00366How to Distinguish Languages and DialectsWichmann, SørenThe terms “language” and “dialect” are ingrained, but linguists nevertheless tend to agree that it is impossible to apply a non-arbitrary distinction such that two speech varieties can be identified as either distinct languages or two dialects of one and the same language. A database of lexical information for more than 7,500 speech varieties, however, unveils a strong tendency for linguistic distances to be bimodally distributed. For a given language group the linguistic distances pertaining to either cluster can be teased apart, identifying a mixture of normal distributions within the data and then separating them fitting curves and finding the point where they cross. The thresholds identified are remarkably consistent across data sets, qualifying their mean as a universal criterion for distinguishing between language and dialect pairs. The mean of the thresholds identified translates into a temporal distance of around one to one-and-a-half millennia (1,075–1,635 years).https://www.mitpressjournals.org/doi/abs/10.1162/coli_a_00366
spellingShingle Wichmann, Søren
How to Distinguish Languages and Dialects
Computational Linguistics
title How to Distinguish Languages and Dialects
title_full How to Distinguish Languages and Dialects
title_fullStr How to Distinguish Languages and Dialects
title_full_unstemmed How to Distinguish Languages and Dialects
title_short How to Distinguish Languages and Dialects
title_sort how to distinguish languages and dialects
url https://www.mitpressjournals.org/doi/abs/10.1162/coli_a_00366
work_keys_str_mv AT wichmannsøren howtodistinguishlanguagesanddialects