WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration
The problem of word sense disambiguation (WSD) is considered in the article. Set of synonyms (synsets) and sentences with these synonyms are taken. It is necessary to automatically select the meaning of the word in the sentence. 1285 sentences were tagged by experts, namely, one of the dictionary me...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Karelian Research Centre of the Russian Academy of Sciences
2018-06-01
|
Series: | Transactions of the Karelian Research Centre of the Russian Academy of Sciences |
Subjects: | |
Online Access: | http://journals.krc.karelia.ru/index.php/mathem/article/view/829 |
_version_ | 1819063981395935232 |
---|---|
author | Andrew Krizhanovsky Alexander Kirillov Natalia Krizhanovskaya |
author_facet | Andrew Krizhanovsky Alexander Kirillov Natalia Krizhanovskaya |
author_sort | Andrew Krizhanovsky |
collection | DOAJ |
description | The problem of word sense disambiguation (WSD) is considered in the article. Set of synonyms (synsets) and sentences with these synonyms are taken. It is necessary to automatically select the meaning of the word in the sentence. 1285 sentences were tagged by experts, namely, one of the dictionary meanings was selected by experts for target words. To solve the WSD problem, an algorithm based on a new method of vector-word contexts proximity calculation is proposed. A preliminary epsilon-filtering of words is performed, both in the sentence and in the set of synonyms, in order to achieve higher accuracy. An extensive program of experiments was carried out. Four algorithms are implemented, including the new algorithm. Experiments have shown that in some cases the new algorithm produces better results. The developed software and the tagged corpus have an open license and are available online. Wiktionary and Wikisource are used. A brief description of this work can be viewed as slides (https://goo.gl/9ak6Gt). A video lecture in Russian about this research is available online (https://youtu.be/-DLmRkepf58). |
first_indexed | 2024-12-21T15:23:18Z |
format | Article |
id | doaj.art-ae7f9bc48f5f453d87880cfe85f6d707 |
institution | Directory Open Access Journal |
issn | 1997-3217 2312-4504 |
language | English |
last_indexed | 2024-12-21T15:23:18Z |
publishDate | 2018-06-01 |
publisher | Karelian Research Centre of the Russian Academy of Sciences |
record_format | Article |
series | Transactions of the Karelian Research Centre of the Russian Academy of Sciences |
spelling | doaj.art-ae7f9bc48f5f453d87880cfe85f6d7072022-12-21T18:58:59ZengKarelian Research Centre of the Russian Academy of SciencesTransactions of the Karelian Research Centre of the Russian Academy of Sciences1997-32172312-45042018-06-01710.17076/mat829613WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtrationAndrew Krizhanovsky0Alexander Kirillov1Natalia Krizhanovskaya2Institute of Applied Mathematical Research of the Karelian Research Centre of the Russian Academy of SciencesInstitute of Applied Mathematical Research of the Karelian Research Centre of the Russian Academy of SciencesInstitute of Applied Mathematical Research of the Karelian Research Centre of the Russian Academy of SciencesThe problem of word sense disambiguation (WSD) is considered in the article. Set of synonyms (synsets) and sentences with these synonyms are taken. It is necessary to automatically select the meaning of the word in the sentence. 1285 sentences were tagged by experts, namely, one of the dictionary meanings was selected by experts for target words. To solve the WSD problem, an algorithm based on a new method of vector-word contexts proximity calculation is proposed. A preliminary epsilon-filtering of words is performed, both in the sentence and in the set of synonyms, in order to achieve higher accuracy. An extensive program of experiments was carried out. Four algorithms are implemented, including the new algorithm. Experiments have shown that in some cases the new algorithm produces better results. The developed software and the tagged corpus have an open license and are available online. Wiktionary and Wikisource are used. A brief description of this work can be viewed as slides (https://goo.gl/9ak6Gt). A video lecture in Russian about this research is available online (https://youtu.be/-DLmRkepf58).http://journals.krc.karelia.ru/index.php/mathem/article/view/829synonymsynsetcorpus linguisticsword2vecwikisourcewsdrusvectoreswiktionary |
spellingShingle | Andrew Krizhanovsky Alexander Kirillov Natalia Krizhanovskaya WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration Transactions of the Karelian Research Centre of the Russian Academy of Sciences synonym synset corpus linguistics word2vec wikisource wsd rusvectores wiktionary |
title | WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
title_full | WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
title_fullStr | WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
title_full_unstemmed | WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
title_short | WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration |
title_sort | wsd algorithm based on a new method of vector word contexts proximity calculation via epsilon filtration |
topic | synonym synset corpus linguistics word2vec wikisource wsd rusvectores wiktionary |
url | http://journals.krc.karelia.ru/index.php/mathem/article/view/829 |
work_keys_str_mv | AT andrewkrizhanovsky wsdalgorithmbasedonanewmethodofvectorwordcontextsproximitycalculationviaepsilonfiltration AT alexanderkirillov wsdalgorithmbasedonanewmethodofvectorwordcontextsproximitycalculationviaepsilonfiltration AT nataliakrizhanovskaya wsdalgorithmbasedonanewmethodofvectorwordcontextsproximitycalculationviaepsilonfiltration |