Word Sense Induction for Russian Texts Using BERT

This article considers an unsupervised approach called word sense induction for resolving word sense disambiguation in the natural languages. The resolution of word sense disambiguation is one of the most important tasks in the natural text processing area, as it is the key problem of many other tas...

Full description

Bibliographic Details
Main Authors: Aleksandr Slapoguzov, Konstantin Malyuga, Evgenij Tsopa
Format: Article
Language:English
Published: FRUCT 2021-01-01
Series:Proceedings of the XXth Conference of Open Innovations Association FRUCT
Subjects:
Online Access:https://www.fruct.org/publications/acm28/files/Sla.pdf
Description
Summary:This article considers an unsupervised approach called word sense induction for resolving word sense disambiguation in the natural languages. The resolution of word sense disambiguation is one of the most important tasks in the natural text processing area, as it is the key problem of many other tasks in this field. Clustering of vector word representations was used to resolve sense ambiguity. Word translation into the vector representation was done with the RuBERT language model which was initialized with BERT and pre-trained on the Russian part of Wikipedia and news data language model. The Affinity Propagation algorithm was applied for clustering in this work. The main feature of this algorithm is not to require the number of clusters as an input parameter. Using this algorithm along with the BERT model led us to the resulting score 0.81 ARI that is comparable to other methods and can be used to resolve the word sense disambiguation. The results of this work could be used in such areas as information search, information extracting and different tasks connected with semantic networks.
ISSN:2305-7254
2343-0737