Efficient estimation of Hindi WSD with distributed word representation in vector space

Word Sense Disambiguation (WSD) is significant for improving the accuracy of the interpretation of a Natural language text. Various supervised learning-based models and knowledge-based models have been developed in the literature for WSD of the language text. However, these models do not provide goo...

Full description

Bibliographic Details
Main Authors: Archana Kumari, D.K. Lobiyal
Format: Article
Language:English
Published: Elsevier 2022-09-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157821000720
_version_ 1811188554263429120
author Archana Kumari
D.K. Lobiyal
author_facet Archana Kumari
D.K. Lobiyal
author_sort Archana Kumari
collection DOAJ
description Word Sense Disambiguation (WSD) is significant for improving the accuracy of the interpretation of a Natural language text. Various supervised learning-based models and knowledge-based models have been developed in the literature for WSD of the language text. However, these models do not provide good results for low-resource languages, due to the lack of labelled and tagged data. Therefore, in this work, we have examined different word embedding techniques for word sense disambiguation of the Hindi language texts. Several studies in the literature show that these embeddings have been utilized for different foreign languages in the field of word sense disambiguation. However, to the best of our knowledge, no such work exists for the Hindi language. Therefore, in this paper, we utilize various existing word embeddings for WSD of Hindi text. Moreover, we have created Hindi word embeddings on articles taken from Wikipedia and test the quality of the created word embeddings using Pearson correlation. In this direction, we perform different experiments and observe that Word2Vec model gives best performance among all the considered embeddings on the used Hindi dataset. In our method, the proposed model directly takes input that is trained with word embedding methods and helps to develop a sense inventory using clustering that has been employed for performing disambiguation. Experimental observations indicate that the performance of the proposed approach is moderate and competent in terms of accuracy. The paper, thus, presents how WSD can leverage these representations to encode rich semantic information.
first_indexed 2024-04-11T14:20:44Z
format Article
id doaj.art-05a6ab7d78974cce9f13a4df0a1fb840
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-04-11T14:20:44Z
publishDate 2022-09-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-05a6ab7d78974cce9f13a4df0a1fb8402022-12-22T04:19:03ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782022-09-0134860926103Efficient estimation of Hindi WSD with distributed word representation in vector spaceArchana Kumari0D.K. Lobiyal1Corresponding author.; School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, IndiaSchool of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, IndiaWord Sense Disambiguation (WSD) is significant for improving the accuracy of the interpretation of a Natural language text. Various supervised learning-based models and knowledge-based models have been developed in the literature for WSD of the language text. However, these models do not provide good results for low-resource languages, due to the lack of labelled and tagged data. Therefore, in this work, we have examined different word embedding techniques for word sense disambiguation of the Hindi language texts. Several studies in the literature show that these embeddings have been utilized for different foreign languages in the field of word sense disambiguation. However, to the best of our knowledge, no such work exists for the Hindi language. Therefore, in this paper, we utilize various existing word embeddings for WSD of Hindi text. Moreover, we have created Hindi word embeddings on articles taken from Wikipedia and test the quality of the created word embeddings using Pearson correlation. In this direction, we perform different experiments and observe that Word2Vec model gives best performance among all the considered embeddings on the used Hindi dataset. In our method, the proposed model directly takes input that is trained with word embedding methods and helps to develop a sense inventory using clustering that has been employed for performing disambiguation. Experimental observations indicate that the performance of the proposed approach is moderate and competent in terms of accuracy. The paper, thus, presents how WSD can leverage these representations to encode rich semantic information.http://www.sciencedirect.com/science/article/pii/S1319157821000720Natural language processingWord embeddingsHindi languageWord sense disambiguationUnsupervised learningClustering
spellingShingle Archana Kumari
D.K. Lobiyal
Efficient estimation of Hindi WSD with distributed word representation in vector space
Journal of King Saud University: Computer and Information Sciences
Natural language processing
Word embeddings
Hindi language
Word sense disambiguation
Unsupervised learning
Clustering
title Efficient estimation of Hindi WSD with distributed word representation in vector space
title_full Efficient estimation of Hindi WSD with distributed word representation in vector space
title_fullStr Efficient estimation of Hindi WSD with distributed word representation in vector space
title_full_unstemmed Efficient estimation of Hindi WSD with distributed word representation in vector space
title_short Efficient estimation of Hindi WSD with distributed word representation in vector space
title_sort efficient estimation of hindi wsd with distributed word representation in vector space
topic Natural language processing
Word embeddings
Hindi language
Word sense disambiguation
Unsupervised learning
Clustering
url http://www.sciencedirect.com/science/article/pii/S1319157821000720
work_keys_str_mv AT archanakumari efficientestimationofhindiwsdwithdistributedwordrepresentationinvectorspace
AT dklobiyal efficientestimationofhindiwsdwithdistributedwordrepresentationinvectorspace