Efficient estimation of Hindi WSD with distributed word representation in vector space
Word Sense Disambiguation (WSD) is significant for improving the accuracy of the interpretation of a Natural language text. Various supervised learning-based models and knowledge-based models have been developed in the literature for WSD of the language text. However, these models do not provide goo...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-09-01
|
Series: | Journal of King Saud University: Computer and Information Sciences |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1319157821000720 |
_version_ | 1811188554263429120 |
---|---|
author | Archana Kumari D.K. Lobiyal |
author_facet | Archana Kumari D.K. Lobiyal |
author_sort | Archana Kumari |
collection | DOAJ |
description | Word Sense Disambiguation (WSD) is significant for improving the accuracy of the interpretation of a Natural language text. Various supervised learning-based models and knowledge-based models have been developed in the literature for WSD of the language text. However, these models do not provide good results for low-resource languages, due to the lack of labelled and tagged data. Therefore, in this work, we have examined different word embedding techniques for word sense disambiguation of the Hindi language texts. Several studies in the literature show that these embeddings have been utilized for different foreign languages in the field of word sense disambiguation. However, to the best of our knowledge, no such work exists for the Hindi language. Therefore, in this paper, we utilize various existing word embeddings for WSD of Hindi text. Moreover, we have created Hindi word embeddings on articles taken from Wikipedia and test the quality of the created word embeddings using Pearson correlation. In this direction, we perform different experiments and observe that Word2Vec model gives best performance among all the considered embeddings on the used Hindi dataset. In our method, the proposed model directly takes input that is trained with word embedding methods and helps to develop a sense inventory using clustering that has been employed for performing disambiguation. Experimental observations indicate that the performance of the proposed approach is moderate and competent in terms of accuracy. The paper, thus, presents how WSD can leverage these representations to encode rich semantic information. |
first_indexed | 2024-04-11T14:20:44Z |
format | Article |
id | doaj.art-05a6ab7d78974cce9f13a4df0a1fb840 |
institution | Directory Open Access Journal |
issn | 1319-1578 |
language | English |
last_indexed | 2024-04-11T14:20:44Z |
publishDate | 2022-09-01 |
publisher | Elsevier |
record_format | Article |
series | Journal of King Saud University: Computer and Information Sciences |
spelling | doaj.art-05a6ab7d78974cce9f13a4df0a1fb8402022-12-22T04:19:03ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782022-09-0134860926103Efficient estimation of Hindi WSD with distributed word representation in vector spaceArchana Kumari0D.K. Lobiyal1Corresponding author.; School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, IndiaSchool of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, IndiaWord Sense Disambiguation (WSD) is significant for improving the accuracy of the interpretation of a Natural language text. Various supervised learning-based models and knowledge-based models have been developed in the literature for WSD of the language text. However, these models do not provide good results for low-resource languages, due to the lack of labelled and tagged data. Therefore, in this work, we have examined different word embedding techniques for word sense disambiguation of the Hindi language texts. Several studies in the literature show that these embeddings have been utilized for different foreign languages in the field of word sense disambiguation. However, to the best of our knowledge, no such work exists for the Hindi language. Therefore, in this paper, we utilize various existing word embeddings for WSD of Hindi text. Moreover, we have created Hindi word embeddings on articles taken from Wikipedia and test the quality of the created word embeddings using Pearson correlation. In this direction, we perform different experiments and observe that Word2Vec model gives best performance among all the considered embeddings on the used Hindi dataset. In our method, the proposed model directly takes input that is trained with word embedding methods and helps to develop a sense inventory using clustering that has been employed for performing disambiguation. Experimental observations indicate that the performance of the proposed approach is moderate and competent in terms of accuracy. The paper, thus, presents how WSD can leverage these representations to encode rich semantic information.http://www.sciencedirect.com/science/article/pii/S1319157821000720Natural language processingWord embeddingsHindi languageWord sense disambiguationUnsupervised learningClustering |
spellingShingle | Archana Kumari D.K. Lobiyal Efficient estimation of Hindi WSD with distributed word representation in vector space Journal of King Saud University: Computer and Information Sciences Natural language processing Word embeddings Hindi language Word sense disambiguation Unsupervised learning Clustering |
title | Efficient estimation of Hindi WSD with distributed word representation in vector space |
title_full | Efficient estimation of Hindi WSD with distributed word representation in vector space |
title_fullStr | Efficient estimation of Hindi WSD with distributed word representation in vector space |
title_full_unstemmed | Efficient estimation of Hindi WSD with distributed word representation in vector space |
title_short | Efficient estimation of Hindi WSD with distributed word representation in vector space |
title_sort | efficient estimation of hindi wsd with distributed word representation in vector space |
topic | Natural language processing Word embeddings Hindi language Word sense disambiguation Unsupervised learning Clustering |
url | http://www.sciencedirect.com/science/article/pii/S1319157821000720 |
work_keys_str_mv | AT archanakumari efficientestimationofhindiwsdwithdistributedwordrepresentationinvectorspace AT dklobiyal efficientestimationofhindiwsdwithdistributedwordrepresentationinvectorspace |