Languages with more speakers tend to be harder to (machine-)learn
Abstract Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-10-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-023-45373-z |
_version_ | 1797647333647712256 |
---|---|
author | Alexander Koplenig Sascha Wolfer |
author_facet | Alexander Koplenig Sascha Wolfer |
author_sort | Alexander Koplenig |
collection | DOAJ |
description | Abstract Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs—ranging from very simple n-gram models to state-of-the-art deep neural networks—on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn. |
first_indexed | 2024-03-11T15:14:46Z |
format | Article |
id | doaj.art-66ed9d93349646408563637f485c9d91 |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-03-11T15:14:46Z |
publishDate | 2023-10-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-66ed9d93349646408563637f485c9d912023-10-29T12:22:11ZengNature PortfolioScientific Reports2045-23222023-10-0113111810.1038/s41598-023-45373-zLanguages with more speakers tend to be harder to (machine-)learnAlexander Koplenig0Sascha Wolfer1Leibniz Institute for the German Language (IDS)Leibniz Institute for the German Language (IDS)Abstract Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs—ranging from very simple n-gram models to state-of-the-art deep neural networks—on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn.https://doi.org/10.1038/s41598-023-45373-z |
spellingShingle | Alexander Koplenig Sascha Wolfer Languages with more speakers tend to be harder to (machine-)learn Scientific Reports |
title | Languages with more speakers tend to be harder to (machine-)learn |
title_full | Languages with more speakers tend to be harder to (machine-)learn |
title_fullStr | Languages with more speakers tend to be harder to (machine-)learn |
title_full_unstemmed | Languages with more speakers tend to be harder to (machine-)learn |
title_short | Languages with more speakers tend to be harder to (machine-)learn |
title_sort | languages with more speakers tend to be harder to machine learn |
url | https://doi.org/10.1038/s41598-023-45373-z |
work_keys_str_mv | AT alexanderkoplenig languageswithmorespeakerstendtobehardertomachinelearn AT saschawolfer languageswithmorespeakerstendtobehardertomachinelearn |