Līdzsvarots mūsdienu latviešu valodas tekstu korpuss un tā tekstu atlases kritēriji
<p><strong>THE BALANCED CORPUS OF MODERN LATVIAN AND THE TEXT SELECTION CRITERIA</strong></p><p><em>Summary</em></p><p>Recently <em>The Balanced Corpus of Modern Latvian</em> (~3.5 million running words) has been created in the Instit...
Main Author: | |
---|---|
Format: | Article |
Language: | deu |
Published: |
Vilnius University
2012-04-01
|
Series: | Baltistica |
Subjects: | |
Online Access: | http://www.baltistica.lt/index.php/baltistica/article/view/2113 |
_version_ | 1818155237661212672 |
---|---|
author | Kristīne Levāne-Petrova |
author_facet | Kristīne Levāne-Petrova |
author_sort | Kristīne Levāne-Petrova |
collection | DOAJ |
description | <p><strong>THE BALANCED CORPUS OF MODERN LATVIAN AND THE TEXT SELECTION CRITERIA</strong></p><p><em>Summary</em></p><p>Recently <em>The Balanced Corpus of Modern Latvian</em> (~3.5 million running words) has been created in the Institute of Mathematics and Computer Science (IMCS) (see <a href="http://www.korpuss.lv" target="_blank">http://www.korpuss.lv</a>). The Corpus has been compiled from printed and electronic materials created after 1990. The Corpus is automatically morphologically tagged: for each token all the syntactically valid interpretations are stored.</p><p>Texts for the Corpus were chosen according to different text selection criteria: for instance, time, media, domain, etc. This article discusses the text selection criteria chosen for this Corpus, problems related to Corpus design and text selection criteria, solutions found for these problems and future plans regarding the Corpus.</p> |
first_indexed | 2024-12-11T14:39:13Z |
format | Article |
id | doaj.art-1706349189e7497a8c40c25a0a9c622a |
institution | Directory Open Access Journal |
issn | 0132-6503 2345-0045 |
language | deu |
last_indexed | 2024-12-11T14:39:13Z |
publishDate | 2012-04-01 |
publisher | Vilnius University |
record_format | Article |
series | Baltistica |
spelling | doaj.art-1706349189e7497a8c40c25a0a9c622a2022-12-22T01:02:00ZdeuVilnius UniversityBaltistica0132-65032345-00452012-04-0108899810.15388/baltistica.0.8.21132006Līdzsvarots mūsdienu latviešu valodas tekstu korpuss un tā tekstu atlases kritērijiKristīne Levāne-Petrova<p><strong>THE BALANCED CORPUS OF MODERN LATVIAN AND THE TEXT SELECTION CRITERIA</strong></p><p><em>Summary</em></p><p>Recently <em>The Balanced Corpus of Modern Latvian</em> (~3.5 million running words) has been created in the Institute of Mathematics and Computer Science (IMCS) (see <a href="http://www.korpuss.lv" target="_blank">http://www.korpuss.lv</a>). The Corpus has been compiled from printed and electronic materials created after 1990. The Corpus is automatically morphologically tagged: for each token all the syntactically valid interpretations are stored.</p><p>Texts for the Corpus were chosen according to different text selection criteria: for instance, time, media, domain, etc. This article discusses the text selection criteria chosen for this Corpus, problems related to Corpus design and text selection criteria, solutions found for these problems and future plans regarding the Corpus.</p>http://www.baltistica.lt/index.php/baltistica/article/view/2113The Balanced Corpus of Modern Latviancomputer linguistics |
spellingShingle | Kristīne Levāne-Petrova Līdzsvarots mūsdienu latviešu valodas tekstu korpuss un tā tekstu atlases kritēriji Baltistica The Balanced Corpus of Modern Latvian computer linguistics |
title | Līdzsvarots mūsdienu latviešu valodas tekstu korpuss un tā tekstu atlases kritēriji |
title_full | Līdzsvarots mūsdienu latviešu valodas tekstu korpuss un tā tekstu atlases kritēriji |
title_fullStr | Līdzsvarots mūsdienu latviešu valodas tekstu korpuss un tā tekstu atlases kritēriji |
title_full_unstemmed | Līdzsvarots mūsdienu latviešu valodas tekstu korpuss un tā tekstu atlases kritēriji |
title_short | Līdzsvarots mūsdienu latviešu valodas tekstu korpuss un tā tekstu atlases kritēriji |
title_sort | lidzsvarots musdienu latviesu valodas tekstu korpuss un ta tekstu atlases kriteriji |
topic | The Balanced Corpus of Modern Latvian computer linguistics |
url | http://www.baltistica.lt/index.php/baltistica/article/view/2113 |
work_keys_str_mv | AT kristinelevanepetrova lidzsvarotsmusdienulatviesuvalodastekstukorpussuntatekstuatlaseskriteriji |