The method behind the unprecedented production of indicators of the presence of languages in the Internet

Reliable and updated indicators of the presence of languages in the Internet are required to drive efficiently policies for languages, to forecast e-commerce market or to support further researches on the field of digital support of languages. This article presents a complete description of the meth...

Full description

Bibliographic Details
Main Authors: Daniel Pimienta, Álvaro Blanco, Gilvan Müller de Oliveira
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-05-01
Series:Frontiers in Research Metrics and Analytics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frma.2023.1149347/full
_version_ 1827944665744146432
author Daniel Pimienta
Álvaro Blanco
Gilvan Müller de Oliveira
author_facet Daniel Pimienta
Álvaro Blanco
Gilvan Müller de Oliveira
author_sort Daniel Pimienta
collection DOAJ
description Reliable and updated indicators of the presence of languages in the Internet are required to drive efficiently policies for languages, to forecast e-commerce market or to support further researches on the field of digital support of languages. This article presents a complete description of the methodological elements involved in the production of an unprecedented set of indicators of the presence in the Internet of the 329 languages with more than 1 million L1 speakers. A special emphasis is given to the treatment of the comprehensive set of biases involved in the process, either from the method or the various sources used in the modeling process. The biases related to other sources providing similar data are also discussed, and in particular, it is shown how the lack of consideration of the high level of multilingualism of the Web leads to a huge overestimation of the presence of English. The detailed list of sources is presented in the various annexes. For the first time in the history of the Internet, the production of indicators about virtual presence of a large set of languages could allow progress in the fields of economy of languages, cyber-geography of languages and language policies for multilingualism.
first_indexed 2024-03-13T10:32:43Z
format Article
id doaj.art-16d18051fcf540e39d3f7bef57105c20
institution Directory Open Access Journal
issn 2504-0537
language English
last_indexed 2024-03-13T10:32:43Z
publishDate 2023-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Research Metrics and Analytics
spelling doaj.art-16d18051fcf540e39d3f7bef57105c202023-05-18T08:17:22ZengFrontiers Media S.A.Frontiers in Research Metrics and Analytics2504-05372023-05-01810.3389/frma.2023.11493471149347The method behind the unprecedented production of indicators of the presence of languages in the InternetDaniel Pimienta0Álvaro Blanco1Gilvan Müller de Oliveira2Observatory of Linguistic and Cultural Diversity on the Internet, Nice, FranceObservatory of Linguistic and Cultural Diversity on the Internet, Nice, FranceUNESCO Chair on Language Policies for Multilingualism, Federal University of Santa Catarina (UFSC), Florianopolis, BrazilReliable and updated indicators of the presence of languages in the Internet are required to drive efficiently policies for languages, to forecast e-commerce market or to support further researches on the field of digital support of languages. This article presents a complete description of the methodological elements involved in the production of an unprecedented set of indicators of the presence in the Internet of the 329 languages with more than 1 million L1 speakers. A special emphasis is given to the treatment of the comprehensive set of biases involved in the process, either from the method or the various sources used in the modeling process. The biases related to other sources providing similar data are also discussed, and in particular, it is shown how the lack of consideration of the high level of multilingualism of the Web leads to a huge overestimation of the presence of English. The detailed list of sources is presented in the various annexes. For the first time in the history of the Internet, the production of indicators about virtual presence of a large set of languages could allow progress in the fields of economy of languages, cyber-geography of languages and language policies for multilingualism.https://www.frontiersin.org/articles/10.3389/frma.2023.1149347/fulllanguageswebInternetindicators and metricsmethodologybias
spellingShingle Daniel Pimienta
Álvaro Blanco
Gilvan Müller de Oliveira
The method behind the unprecedented production of indicators of the presence of languages in the Internet
Frontiers in Research Metrics and Analytics
languages
web
Internet
indicators and metrics
methodology
bias
title The method behind the unprecedented production of indicators of the presence of languages in the Internet
title_full The method behind the unprecedented production of indicators of the presence of languages in the Internet
title_fullStr The method behind the unprecedented production of indicators of the presence of languages in the Internet
title_full_unstemmed The method behind the unprecedented production of indicators of the presence of languages in the Internet
title_short The method behind the unprecedented production of indicators of the presence of languages in the Internet
title_sort method behind the unprecedented production of indicators of the presence of languages in the internet
topic languages
web
Internet
indicators and metrics
methodology
bias
url https://www.frontiersin.org/articles/10.3389/frma.2023.1149347/full
work_keys_str_mv AT danielpimienta themethodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet
AT alvaroblanco themethodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet
AT gilvanmullerdeoliveira themethodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet
AT danielpimienta methodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet
AT alvaroblanco methodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet
AT gilvanmullerdeoliveira methodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet