WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration

The problem of word sense disambiguation (WSD) is considered in the article. Set of synonyms (synsets) and sentences with these synonyms are taken. It is necessary to automatically select the meaning of the word in the sentence. 1285 sentences were tagged by experts, namely, one of the dictionary me...

Full description

Bibliographic Details
Main Authors: Andrew Krizhanovsky, Alexander Kirillov, Natalia Krizhanovskaya
Format: Article
Language:English
Published: Karelian Research Centre of the Russian Academy of Sciences 2018-06-01
Series:Transactions of the Karelian Research Centre of the Russian Academy of Sciences
Subjects:
Online Access:http://journals.krc.karelia.ru/index.php/mathem/article/view/829
_version_ 1819063981395935232
author Andrew Krizhanovsky
Alexander Kirillov
Natalia Krizhanovskaya
author_facet Andrew Krizhanovsky
Alexander Kirillov
Natalia Krizhanovskaya
author_sort Andrew Krizhanovsky
collection DOAJ
description The problem of word sense disambiguation (WSD) is considered in the article. Set of synonyms (synsets) and sentences with these synonyms are taken. It is necessary to automatically select the meaning of the word in the sentence. 1285 sentences were tagged by experts, namely, one of the dictionary meanings was selected by experts for target words. To solve the WSD problem, an algorithm based on a new method of vector-word contexts proximity calculation is proposed. A preliminary epsilon-filtering of words is performed, both in the sentence and in the set of synonyms, in order to achieve higher accuracy. An extensive program of experiments was carried out. Four algorithms are implemented, including the new algorithm. Experiments have shown that in some cases the new algorithm produces better results. The developed software and the tagged corpus have an open license and are available online. Wiktionary and Wikisource are used. A brief description of this work can be viewed as slides (https://goo.gl/9ak6Gt). A video lecture in Russian about this research is available online (https://youtu.be/-DLmRkepf58).
first_indexed 2024-12-21T15:23:18Z
format Article
id doaj.art-ae7f9bc48f5f453d87880cfe85f6d707
institution Directory Open Access Journal
issn 1997-3217
2312-4504
language English
last_indexed 2024-12-21T15:23:18Z
publishDate 2018-06-01
publisher Karelian Research Centre of the Russian Academy of Sciences
record_format Article
series Transactions of the Karelian Research Centre of the Russian Academy of Sciences
spelling doaj.art-ae7f9bc48f5f453d87880cfe85f6d7072022-12-21T18:58:59ZengKarelian Research Centre of the Russian Academy of SciencesTransactions of the Karelian Research Centre of the Russian Academy of Sciences1997-32172312-45042018-06-01710.17076/mat829613WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtrationAndrew Krizhanovsky0Alexander Kirillov1Natalia Krizhanovskaya2Institute of Applied Mathematical Research of the Karelian Research Centre of the Russian Academy of SciencesInstitute of Applied Mathematical Research of the Karelian Research Centre of the Russian Academy of SciencesInstitute of Applied Mathematical Research of the Karelian Research Centre of the Russian Academy of SciencesThe problem of word sense disambiguation (WSD) is considered in the article. Set of synonyms (synsets) and sentences with these synonyms are taken. It is necessary to automatically select the meaning of the word in the sentence. 1285 sentences were tagged by experts, namely, one of the dictionary meanings was selected by experts for target words. To solve the WSD problem, an algorithm based on a new method of vector-word contexts proximity calculation is proposed. A preliminary epsilon-filtering of words is performed, both in the sentence and in the set of synonyms, in order to achieve higher accuracy. An extensive program of experiments was carried out. Four algorithms are implemented, including the new algorithm. Experiments have shown that in some cases the new algorithm produces better results. The developed software and the tagged corpus have an open license and are available online. Wiktionary and Wikisource are used. A brief description of this work can be viewed as slides (https://goo.gl/9ak6Gt). A video lecture in Russian about this research is available online (https://youtu.be/-DLmRkepf58).http://journals.krc.karelia.ru/index.php/mathem/article/view/829synonymsynsetcorpus linguisticsword2vecwikisourcewsdrusvectoreswiktionary
spellingShingle Andrew Krizhanovsky
Alexander Kirillov
Natalia Krizhanovskaya
WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
Transactions of the Karelian Research Centre of the Russian Academy of Sciences
synonym
synset
corpus linguistics
word2vec
wikisource
wsd
rusvectores
wiktionary
title WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
title_full WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
title_fullStr WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
title_full_unstemmed WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
title_short WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
title_sort wsd algorithm based on a new method of vector word contexts proximity calculation via epsilon filtration
topic synonym
synset
corpus linguistics
word2vec
wikisource
wsd
rusvectores
wiktionary
url http://journals.krc.karelia.ru/index.php/mathem/article/view/829
work_keys_str_mv AT andrewkrizhanovsky wsdalgorithmbasedonanewmethodofvectorwordcontextsproximitycalculationviaepsilonfiltration
AT alexanderkirillov wsdalgorithmbasedonanewmethodofvectorwordcontextsproximitycalculationviaepsilonfiltration
AT nataliakrizhanovskaya wsdalgorithmbasedonanewmethodofvectorwordcontextsproximitycalculationviaepsilonfiltration