RussianLanguage Thesauri: Automated Construction and Application For Natural Language Processing Tasks

The paper reviews the existing Russian-language thesauri in digital form and methods of their automatic construction and application. The authors analyzed the main characteristics of open access thesauri for scientific research, evaluated trends of their development, and their effectiveness in solvi...

Full description

Bibliographic Details
Main Authors: Nadezhda S. Lagutina, Ksenia V. Lagutina, Aleksey S. Adrianov, Ilya V. Paramonov
Format: Article
Language:English
Published: Yaroslavl State University 2018-08-01
Series:Моделирование и анализ информационных систем
Subjects:
Online Access:https://www.mais-journal.ru/jour/article/view/735
_version_ 1797878031570698240
author Nadezhda S. Lagutina
Ksenia V. Lagutina
Aleksey S. Adrianov
Ilya V. Paramonov
author_facet Nadezhda S. Lagutina
Ksenia V. Lagutina
Aleksey S. Adrianov
Ilya V. Paramonov
author_sort Nadezhda S. Lagutina
collection DOAJ
description The paper reviews the existing Russian-language thesauri in digital form and methods of their automatic construction and application. The authors analyzed the main characteristics of open access thesauri for scientific research, evaluated trends of their development, and their effectiveness in solving natural language processing tasks. The statistical and linguistic methods of thesaurus construction that allow to automate the development and reduce labor costs of expert linguists were studied. In particular, the authors considered algorithms for extracting keywords and semantic thesaurus relationships of all types, as well as the quality of thesauri generated with the use of these tools. To illustrate features of various methods for constructing thesaurus relationships, the authors developed a combined method that generates a specialized thesaurus fully automatically taking into account a text corpus in a particular domain and several existing linguistic resources. With the proposed method, experiments were conducted with two Russian-language text corpora from two subject areas: articles about migrants and tweets. The resulting thesauri were assessed by using an integrated assessment developed in the previous authors’ study that allows to analyze various aspects of the thesaurus and the quality of the generation methods. The analysis revealed the main advantages and disadvantages of various approaches to the construction of thesauri and the extraction of semantic relationships of different types, as well as made it possible to determine directions for future study.
first_indexed 2024-04-10T02:26:15Z
format Article
id doaj.art-9c5f5a4a03b547b68c5e21fb5597ab1d
institution Directory Open Access Journal
issn 1818-1015
2313-5417
language English
last_indexed 2024-04-10T02:26:15Z
publishDate 2018-08-01
publisher Yaroslavl State University
record_format Article
series Моделирование и анализ информационных систем
spelling doaj.art-9c5f5a4a03b547b68c5e21fb5597ab1d2023-03-13T08:07:28ZengYaroslavl State UniversityМоделирование и анализ информационных систем1818-10152313-54172018-08-0125443545810.18255/1818-1015-2018-4-435-458518RussianLanguage Thesauri: Automated Construction and Application For Natural Language Processing TasksNadezhda S. Lagutina0Ksenia V. Lagutina1Aleksey S. Adrianov2Ilya V. Paramonov3Ярославский государственный университет им. П.Г. ДемидоваЯрославский государственный университет им. П.Г. ДемидоваЯрославский государственный университет им. П.Г. ДемидоваЯрославский государственный университет им. П.Г. ДемидоваThe paper reviews the existing Russian-language thesauri in digital form and methods of their automatic construction and application. The authors analyzed the main characteristics of open access thesauri for scientific research, evaluated trends of their development, and their effectiveness in solving natural language processing tasks. The statistical and linguistic methods of thesaurus construction that allow to automate the development and reduce labor costs of expert linguists were studied. In particular, the authors considered algorithms for extracting keywords and semantic thesaurus relationships of all types, as well as the quality of thesauri generated with the use of these tools. To illustrate features of various methods for constructing thesaurus relationships, the authors developed a combined method that generates a specialized thesaurus fully automatically taking into account a text corpus in a particular domain and several existing linguistic resources. With the proposed method, experiments were conducted with two Russian-language text corpora from two subject areas: articles about migrants and tweets. The resulting thesauri were assessed by using an integrated assessment developed in the previous authors’ study that allows to analyze various aspects of the thesaurus and the quality of the generation methods. The analysis revealed the main advantages and disadvantages of various approaches to the construction of thesauri and the extraction of semantic relationships of different types, as well as made it possible to determine directions for future study.https://www.mais-journal.ru/jour/article/view/735тезауруссемантические отношенияавтоматическое построение тезаурусаавтоматическое выделение связейвыделение ключевых слов
spellingShingle Nadezhda S. Lagutina
Ksenia V. Lagutina
Aleksey S. Adrianov
Ilya V. Paramonov
RussianLanguage Thesauri: Automated Construction and Application For Natural Language Processing Tasks
Моделирование и анализ информационных систем
тезаурус
семантические отношения
автоматическое построение тезауруса
автоматическое выделение связей
выделение ключевых слов
title RussianLanguage Thesauri: Automated Construction and Application For Natural Language Processing Tasks
title_full RussianLanguage Thesauri: Automated Construction and Application For Natural Language Processing Tasks
title_fullStr RussianLanguage Thesauri: Automated Construction and Application For Natural Language Processing Tasks
title_full_unstemmed RussianLanguage Thesauri: Automated Construction and Application For Natural Language Processing Tasks
title_short RussianLanguage Thesauri: Automated Construction and Application For Natural Language Processing Tasks
title_sort russianlanguage thesauri automated construction and application for natural language processing tasks
topic тезаурус
семантические отношения
автоматическое построение тезауруса
автоматическое выделение связей
выделение ключевых слов
url https://www.mais-journal.ru/jour/article/view/735
work_keys_str_mv AT nadezhdaslagutina russianlanguagethesauriautomatedconstructionandapplicationfornaturallanguageprocessingtasks
AT kseniavlagutina russianlanguagethesauriautomatedconstructionandapplicationfornaturallanguageprocessingtasks
AT alekseysadrianov russianlanguagethesauriautomatedconstructionandapplicationfornaturallanguageprocessingtasks
AT ilyavparamonov russianlanguagethesauriautomatedconstructionandapplicationfornaturallanguageprocessingtasks