Languages with more speakers tend to be harder to (machine-)learn

Abstract Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study...

Full description

Bibliographic Details
Main Authors: Alexander Koplenig, Sascha Wolfer
Format: Article
Language:English
Published: Nature Portfolio 2023-10-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-45373-z
_version_ 1797647333647712256
author Alexander Koplenig
Sascha Wolfer
author_facet Alexander Koplenig
Sascha Wolfer
author_sort Alexander Koplenig
collection DOAJ
description Abstract Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs—ranging from very simple n-gram models to state-of-the-art deep neural networks—on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn.
first_indexed 2024-03-11T15:14:46Z
format Article
id doaj.art-66ed9d93349646408563637f485c9d91
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-11T15:14:46Z
publishDate 2023-10-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-66ed9d93349646408563637f485c9d912023-10-29T12:22:11ZengNature PortfolioScientific Reports2045-23222023-10-0113111810.1038/s41598-023-45373-zLanguages with more speakers tend to be harder to (machine-)learnAlexander Koplenig0Sascha Wolfer1Leibniz Institute for the German Language (IDS)Leibniz Institute for the German Language (IDS)Abstract Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs—ranging from very simple n-gram models to state-of-the-art deep neural networks—on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn.https://doi.org/10.1038/s41598-023-45373-z
spellingShingle Alexander Koplenig
Sascha Wolfer
Languages with more speakers tend to be harder to (machine-)learn
Scientific Reports
title Languages with more speakers tend to be harder to (machine-)learn
title_full Languages with more speakers tend to be harder to (machine-)learn
title_fullStr Languages with more speakers tend to be harder to (machine-)learn
title_full_unstemmed Languages with more speakers tend to be harder to (machine-)learn
title_short Languages with more speakers tend to be harder to (machine-)learn
title_sort languages with more speakers tend to be harder to machine learn
url https://doi.org/10.1038/s41598-023-45373-z
work_keys_str_mv AT alexanderkoplenig languageswithmorespeakerstendtobehardertomachinelearn
AT saschawolfer languageswithmorespeakerstendtobehardertomachinelearn