Efficient estimation of Hindi WSD with distributed word representation in vector space

Word Sense Disambiguation (WSD) is significant for improving the accuracy of the interpretation of a Natural language text. Various supervised learning-based models and knowledge-based models have been developed in the literature for WSD of the language text. However, these models do not provide goo...

Full description

Bibliographic Details
Main Authors:	Archana Kumari, D.K. Lobiyal
Format:	Article
Language:	English
Published:	Elsevier 2022-09-01
Series:	Journal of King Saud University: Computer and Information Sciences
Subjects:	Natural language processing Word embeddings Hindi language Word sense disambiguation Unsupervised learning Clustering
Online Access:	http://www.sciencedirect.com/science/article/pii/S1319157821000720

_version_	1811188554263429120
author	Archana Kumari D.K. Lobiyal
author_facet	Archana Kumari D.K. Lobiyal
author_sort	Archana Kumari
collection	DOAJ
description	Word Sense Disambiguation (WSD) is significant for improving the accuracy of the interpretation of a Natural language text. Various supervised learning-based models and knowledge-based models have been developed in the literature for WSD of the language text. However, these models do not provide good results for low-resource languages, due to the lack of labelled and tagged data. Therefore, in this work, we have examined different word embedding techniques for word sense disambiguation of the Hindi language texts. Several studies in the literature show that these embeddings have been utilized for different foreign languages in the field of word sense disambiguation. However, to the best of our knowledge, no such work exists for the Hindi language. Therefore, in this paper, we utilize various existing word embeddings for WSD of Hindi text. Moreover, we have created Hindi word embeddings on articles taken from Wikipedia and test the quality of the created word embeddings using Pearson correlation. In this direction, we perform different experiments and observe that Word2Vec model gives best performance among all the considered embeddings on the used Hindi dataset. In our method, the proposed model directly takes input that is trained with word embedding methods and helps to develop a sense inventory using clustering that has been employed for performing disambiguation. Experimental observations indicate that the performance of the proposed approach is moderate and competent in terms of accuracy. The paper, thus, presents how WSD can leverage these representations to encode rich semantic information.
first_indexed	2024-04-11T14:20:44Z
format	Article
id	doaj.art-05a6ab7d78974cce9f13a4df0a1fb840
institution	Directory Open Access Journal
issn	1319-1578
language	English
last_indexed	2024-04-11T14:20:44Z
publishDate	2022-09-01
publisher	Elsevier
record_format	Article
series	Journal of King Saud University: Computer and Information Sciences
spelling	doaj.art-05a6ab7d78974cce9f13a4df0a1fb8402022-12-22T04:19:03ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782022-09-0134860926103Efficient estimation of Hindi WSD with distributed word representation in vector spaceArchana Kumari0D.K. Lobiyal1Corresponding author.; School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, IndiaSchool of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, IndiaWord Sense Disambiguation (WSD) is significant for improving the accuracy of the interpretation of a Natural language text. Various supervised learning-based models and knowledge-based models have been developed in the literature for WSD of the language text. However, these models do not provide good results for low-resource languages, due to the lack of labelled and tagged data. Therefore, in this work, we have examined different word embedding techniques for word sense disambiguation of the Hindi language texts. Several studies in the literature show that these embeddings have been utilized for different foreign languages in the field of word sense disambiguation. However, to the best of our knowledge, no such work exists for the Hindi language. Therefore, in this paper, we utilize various existing word embeddings for WSD of Hindi text. Moreover, we have created Hindi word embeddings on articles taken from Wikipedia and test the quality of the created word embeddings using Pearson correlation. In this direction, we perform different experiments and observe that Word2Vec model gives best performance among all the considered embeddings on the used Hindi dataset. In our method, the proposed model directly takes input that is trained with word embedding methods and helps to develop a sense inventory using clustering that has been employed for performing disambiguation. Experimental observations indicate that the performance of the proposed approach is moderate and competent in terms of accuracy. The paper, thus, presents how WSD can leverage these representations to encode rich semantic information.http://www.sciencedirect.com/science/article/pii/S1319157821000720Natural language processingWord embeddingsHindi languageWord sense disambiguationUnsupervised learningClustering
spellingShingle	Archana Kumari D.K. Lobiyal Efficient estimation of Hindi WSD with distributed word representation in vector space Journal of King Saud University: Computer and Information Sciences Natural language processing Word embeddings Hindi language Word sense disambiguation Unsupervised learning Clustering
title	Efficient estimation of Hindi WSD with distributed word representation in vector space
title_full	Efficient estimation of Hindi WSD with distributed word representation in vector space
title_fullStr	Efficient estimation of Hindi WSD with distributed word representation in vector space
title_full_unstemmed	Efficient estimation of Hindi WSD with distributed word representation in vector space
title_short	Efficient estimation of Hindi WSD with distributed word representation in vector space
title_sort	efficient estimation of hindi wsd with distributed word representation in vector space
topic	Natural language processing Word embeddings Hindi language Word sense disambiguation Unsupervised learning Clustering
url	http://www.sciencedirect.com/science/article/pii/S1319157821000720
work_keys_str_mv	AT archanakumari efficientestimationofhindiwsdwithdistributedwordrepresentationinvectorspace AT dklobiyal efficientestimationofhindiwsdwithdistributedwordrepresentationinvectorspace

Efficient estimation of Hindi WSD with distributed word representation in vector space

Similar Items