Linguistic corpora of understudied languages: Do they make sense?

A corpus of an understudied language usually has documentary-linguistic nature and comprises all text material available in a particular language. However, without resorting to text selection, it is impossible to obtain a representative and balanced sample of language use. Lack of these two charact...

Full description

Bibliographic Details
Main Author: Igor Vinogradov
Format: Article
Language:Spanish
Published: Universidad de Costa Rica 2016-05-01
Series:Káñina
Subjects:
Online Access:https://revistas.ucr.ac.cr/index.php/kanina/article/view/24143
_version_ 1818241675874533376
author Igor Vinogradov
author_facet Igor Vinogradov
author_sort Igor Vinogradov
collection DOAJ
description A corpus of an understudied language usually has documentary-linguistic nature and comprises all text material available in a particular language. However, without resorting to text selection, it is impossible to obtain a representative and balanced sample of language use. Lack of these two characteristics makes a corpus almost useless for any kind of quantitative research. Nevertheless, corpora of understudied languages comply with a wide range of language documentation objectives. Furthermore, they can serve as evidence of the existence of word forms or grammatical features in texts that meet specific search criteria. If such corpora have well-elaborated linguistic annotation, they can complement grammatical descriptions and dictionaries, standing out against common text collections due to their digital format. They are especially suitable for typological research, when one has to deal with a huge amount of data in different and unrelated languages. 
first_indexed 2024-12-12T13:33:07Z
format Article
id doaj.art-56fa9ff969f7486887cbb0d097d9770b
institution Directory Open Access Journal
issn 0378-0473
2215-2636
language Spanish
last_indexed 2024-12-12T13:33:07Z
publishDate 2016-05-01
publisher Universidad de Costa Rica
record_format Article
series Káñina
spelling doaj.art-56fa9ff969f7486887cbb0d097d9770b2022-12-22T00:23:01ZspaUniversidad de Costa RicaKáñina0378-04732215-26362016-05-0140110.15517/rk.v40i1.24143Linguistic corpora of understudied languages: Do they make sense?Igor Vinogradov0Universidad Nacional Autónoma de México. Becario del Instituto de Investigaciones Antropológicas. A corpus of an understudied language usually has documentary-linguistic nature and comprises all text material available in a particular language. However, without resorting to text selection, it is impossible to obtain a representative and balanced sample of language use. Lack of these two characteristics makes a corpus almost useless for any kind of quantitative research. Nevertheless, corpora of understudied languages comply with a wide range of language documentation objectives. Furthermore, they can serve as evidence of the existence of word forms or grammatical features in texts that meet specific search criteria. If such corpora have well-elaborated linguistic annotation, they can complement grammatical descriptions and dictionaries, standing out against common text collections due to their digital format. They are especially suitable for typological research, when one has to deal with a huge amount of data in different and unrelated languages.  https://revistas.ucr.ac.cr/index.php/kanina/article/view/24143corpus linguisticsunderstudied languageslanguage documentationquantitative methods
spellingShingle Igor Vinogradov
Linguistic corpora of understudied languages: Do they make sense?
Káñina
corpus linguistics
understudied languages
language documentation
quantitative methods
title Linguistic corpora of understudied languages: Do they make sense?
title_full Linguistic corpora of understudied languages: Do they make sense?
title_fullStr Linguistic corpora of understudied languages: Do they make sense?
title_full_unstemmed Linguistic corpora of understudied languages: Do they make sense?
title_short Linguistic corpora of understudied languages: Do they make sense?
title_sort linguistic corpora of understudied languages do they make sense
topic corpus linguistics
understudied languages
language documentation
quantitative methods
url https://revistas.ucr.ac.cr/index.php/kanina/article/view/24143
work_keys_str_mv AT igorvinogradov linguisticcorporaofunderstudiedlanguagesdotheymakesense