Dictionary of Modern Slovene: From Slovene Lexical Database to Digital Dictionary Databas

The ability to process language data has become fundamental to the development of technologies in various areas of human life in the digital world. The development of digitally readable linguistic resources, methods, and tools is, therefore, also a key challenge for the contemporary Slovene language...

Full description

Bibliographic Details
Main Author: Polona Gantar
Format: Article
Language:Croatian
Published: Institut za hrvatski jezik i jezikoslovlje 2020-01-01
Series:Rasprave Instituta za Hrvatski Jezik i Jezikoslovlje
Subjects:
Online Access:https://hrcak.srce.hr/file/356570
_version_ 1797207159486808064
author Polona Gantar
author_facet Polona Gantar
author_sort Polona Gantar
collection DOAJ
description The ability to process language data has become fundamental to the development of technologies in various areas of human life in the digital world. The development of digitally readable linguistic resources, methods, and tools is, therefore, also a key challenge for the contemporary Slovene language. This challenge has been recognized in the Slovene language community both at the professional and state level and has been the subject of many activities over the past ten years, which will be presented in this paper. The idea of a comprehensive dictionary database covering all levels of linguistic description in modern Slovene, from the morphological and lexical levels to the syntactic level, has already formulated within the framework of the European Social Fund’s Communication in Slovene (2008-2013) project; the Slovene Lexical Database was also created within the framework of this project. Two goals were pursued in designing the Slovene Lexical Database (SLD): creating linguistic descriptions of Slovene intended for human users that would also be useful for the machine processing of Slovene. Ever since the construction of the first Slovene corpus, it has become evident that there is a need for a description of modern Slovene based on real language data, and that it is necessary to understand the needs of language users to create useful language reference works. It also became apparent that only the digital medium enables the comprehensiveness of language description and that the design of the database must be adapted to it from the start. Also, the description must follow best practices as closely as possible in terms of formats and international standards, as this enables the inclusion of Slovene into a wider network of resources, such as Open Linked Data, babelNet and ELExIS. Due to time pressures and trends in lexicography, procedures to automate the extraction of linguistic data from corpora and the inclusion of crowdsourcing into the lexicographic process were taken into consideration. Following the essential idea of creating an all-inclusive digital dictionary database for Slovene, a few independent databases have been created over the past two years: the Collocations Dictionary of Modern Slovene, and the automatically generated Thesaurus of Modern Slovene, both of which also exist as independent online dictionary portals. One of the novelties that we put forward together with both dictionaries is the ‘responsive dictionary’ concept, which includes crowdsourcing methods. Ultimately, the Digital Dictionary Database provides all (other) levels of linguistic description: the morphological level with the Sloleks database upgrade, the phraseological level with the construction of a multi-word expressions lexicon, and the syntactic level with the formalization of Slovene verb valency patterns. Each of these databases contains its specific language data that will ultimately be included in the comprehensive Slovene Digital Dictionary Database, which will represent basic linguistic descriptions of Slovene both for the human and machine user.
first_indexed 2024-04-24T09:18:29Z
format Article
id doaj.art-d708b7e0d55942fd95cb325ab7428a7e
institution Directory Open Access Journal
issn 1331-6745
1849-0379
language Croatian
last_indexed 2024-04-24T09:18:29Z
publishDate 2020-01-01
publisher Institut za hrvatski jezik i jezikoslovlje
record_format Article
series Rasprave Instituta za Hrvatski Jezik i Jezikoslovlje
spelling doaj.art-d708b7e0d55942fd95cb325ab7428a7e2024-04-15T16:29:46ZhrvInstitut za hrvatski jezik i jezikoslovljeRasprave Instituta za Hrvatski Jezik i Jezikoslovlje1331-67451849-03792020-01-0146258960210.31724/rihjj.46.2.7Dictionary of Modern Slovene: From Slovene Lexical Database to Digital Dictionary DatabasPolona Gantar0Faculty of Arts, University of LjubljanaThe ability to process language data has become fundamental to the development of technologies in various areas of human life in the digital world. The development of digitally readable linguistic resources, methods, and tools is, therefore, also a key challenge for the contemporary Slovene language. This challenge has been recognized in the Slovene language community both at the professional and state level and has been the subject of many activities over the past ten years, which will be presented in this paper. The idea of a comprehensive dictionary database covering all levels of linguistic description in modern Slovene, from the morphological and lexical levels to the syntactic level, has already formulated within the framework of the European Social Fund’s Communication in Slovene (2008-2013) project; the Slovene Lexical Database was also created within the framework of this project. Two goals were pursued in designing the Slovene Lexical Database (SLD): creating linguistic descriptions of Slovene intended for human users that would also be useful for the machine processing of Slovene. Ever since the construction of the first Slovene corpus, it has become evident that there is a need for a description of modern Slovene based on real language data, and that it is necessary to understand the needs of language users to create useful language reference works. It also became apparent that only the digital medium enables the comprehensiveness of language description and that the design of the database must be adapted to it from the start. Also, the description must follow best practices as closely as possible in terms of formats and international standards, as this enables the inclusion of Slovene into a wider network of resources, such as Open Linked Data, babelNet and ELExIS. Due to time pressures and trends in lexicography, procedures to automate the extraction of linguistic data from corpora and the inclusion of crowdsourcing into the lexicographic process were taken into consideration. Following the essential idea of creating an all-inclusive digital dictionary database for Slovene, a few independent databases have been created over the past two years: the Collocations Dictionary of Modern Slovene, and the automatically generated Thesaurus of Modern Slovene, both of which also exist as independent online dictionary portals. One of the novelties that we put forward together with both dictionaries is the ‘responsive dictionary’ concept, which includes crowdsourcing methods. Ultimately, the Digital Dictionary Database provides all (other) levels of linguistic description: the morphological level with the Sloleks database upgrade, the phraseological level with the construction of a multi-word expressions lexicon, and the syntactic level with the formalization of Slovene verb valency patterns. Each of these databases contains its specific language data that will ultimately be included in the comprehensive Slovene Digital Dictionary Database, which will represent basic linguistic descriptions of Slovene both for the human and machine user.https://hrcak.srce.hr/file/356570dictionary of modern Slovenedigital dictionary databaseelectronic lexicographydigital-born dictionary
spellingShingle Polona Gantar
Dictionary of Modern Slovene: From Slovene Lexical Database to Digital Dictionary Databas
Rasprave Instituta za Hrvatski Jezik i Jezikoslovlje
dictionary of modern Slovene
digital dictionary database
electronic lexicography
digital-born dictionary
title Dictionary of Modern Slovene: From Slovene Lexical Database to Digital Dictionary Databas
title_full Dictionary of Modern Slovene: From Slovene Lexical Database to Digital Dictionary Databas
title_fullStr Dictionary of Modern Slovene: From Slovene Lexical Database to Digital Dictionary Databas
title_full_unstemmed Dictionary of Modern Slovene: From Slovene Lexical Database to Digital Dictionary Databas
title_short Dictionary of Modern Slovene: From Slovene Lexical Database to Digital Dictionary Databas
title_sort dictionary of modern slovene from slovene lexical database to digital dictionary databas
topic dictionary of modern Slovene
digital dictionary database
electronic lexicography
digital-born dictionary
url https://hrcak.srce.hr/file/356570
work_keys_str_mv AT polonagantar dictionaryofmodernslovenefromslovenelexicaldatabasetodigitaldictionarydatabas