Linguistic corpora of understudied languages: Do they make sense?
A corpus of an understudied language usually has documentary-linguistic nature and comprises all text material available in a particular language. However, without resorting to text selection, it is impossible to obtain a representative and balanced sample of language use. Lack of these two charact...
Main Author: | |
---|---|
Format: | Article |
Language: | Spanish |
Published: |
Universidad de Costa Rica
2016-05-01
|
Series: | Káñina |
Subjects: | |
Online Access: | https://revistas.ucr.ac.cr/index.php/kanina/article/view/24143 |
_version_ | 1818241675874533376 |
---|---|
author | Igor Vinogradov |
author_facet | Igor Vinogradov |
author_sort | Igor Vinogradov |
collection | DOAJ |
description |
A corpus of an understudied language usually has documentary-linguistic nature and comprises all text material available in a particular language. However, without resorting to text selection, it is impossible to obtain a representative and balanced sample of language use. Lack of these two characteristics makes a corpus almost useless for any kind of quantitative research. Nevertheless, corpora of understudied languages comply with a wide range of language documentation objectives. Furthermore, they can serve as evidence of the existence of word forms or grammatical features in texts that meet specific search criteria. If such corpora have well-elaborated linguistic annotation, they can complement grammatical descriptions and dictionaries, standing out against common text collections due to their digital format. They are especially suitable for typological research, when one has to deal with a huge amount of data in different and unrelated languages.
|
first_indexed | 2024-12-12T13:33:07Z |
format | Article |
id | doaj.art-56fa9ff969f7486887cbb0d097d9770b |
institution | Directory Open Access Journal |
issn | 0378-0473 2215-2636 |
language | Spanish |
last_indexed | 2024-12-12T13:33:07Z |
publishDate | 2016-05-01 |
publisher | Universidad de Costa Rica |
record_format | Article |
series | Káñina |
spelling | doaj.art-56fa9ff969f7486887cbb0d097d9770b2022-12-22T00:23:01ZspaUniversidad de Costa RicaKáñina0378-04732215-26362016-05-0140110.15517/rk.v40i1.24143Linguistic corpora of understudied languages: Do they make sense?Igor Vinogradov0Universidad Nacional Autónoma de México. Becario del Instituto de Investigaciones Antropológicas. A corpus of an understudied language usually has documentary-linguistic nature and comprises all text material available in a particular language. However, without resorting to text selection, it is impossible to obtain a representative and balanced sample of language use. Lack of these two characteristics makes a corpus almost useless for any kind of quantitative research. Nevertheless, corpora of understudied languages comply with a wide range of language documentation objectives. Furthermore, they can serve as evidence of the existence of word forms or grammatical features in texts that meet specific search criteria. If such corpora have well-elaborated linguistic annotation, they can complement grammatical descriptions and dictionaries, standing out against common text collections due to their digital format. They are especially suitable for typological research, when one has to deal with a huge amount of data in different and unrelated languages. https://revistas.ucr.ac.cr/index.php/kanina/article/view/24143corpus linguisticsunderstudied languageslanguage documentationquantitative methods |
spellingShingle | Igor Vinogradov Linguistic corpora of understudied languages: Do they make sense? Káñina corpus linguistics understudied languages language documentation quantitative methods |
title | Linguistic corpora of understudied languages: Do they make sense? |
title_full | Linguistic corpora of understudied languages: Do they make sense? |
title_fullStr | Linguistic corpora of understudied languages: Do they make sense? |
title_full_unstemmed | Linguistic corpora of understudied languages: Do they make sense? |
title_short | Linguistic corpora of understudied languages: Do they make sense? |
title_sort | linguistic corpora of understudied languages do they make sense |
topic | corpus linguistics understudied languages language documentation quantitative methods |
url | https://revistas.ucr.ac.cr/index.php/kanina/article/view/24143 |
work_keys_str_mv | AT igorvinogradov linguisticcorporaofunderstudiedlanguagesdotheymakesense |