Contextual word disambiguates of Ge'ez language with homophonic using machine learning

According to natural language processing experts, there are numerous ambiguous words in languages. Without automated word meaning disambiguation for any language, the development of natural language processing technologies such as information extraction, information retrieval, machine translation, a...

Full description

Bibliographic Details
Main Authors: Mequanent Degu Belete, Ayodeji Olalekan Salau, Girma Kassa Alitasb, Tigist Bezabh
Format: Article
Language:English
Published: Elsevier 2024-06-01
Series:Ampersand
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2215039024000079
_version_ 1827219869339222016
author Mequanent Degu Belete
Ayodeji Olalekan Salau
Girma Kassa Alitasb
Tigist Bezabh
author_facet Mequanent Degu Belete
Ayodeji Olalekan Salau
Girma Kassa Alitasb
Tigist Bezabh
author_sort Mequanent Degu Belete
collection DOAJ
description According to natural language processing experts, there are numerous ambiguous words in languages. Without automated word meaning disambiguation for any language, the development of natural language processing technologies such as information extraction, information retrieval, machine translation, and others are still challenging task. Therfore, this paper presents the development of a word sense disambiguation model for duplicate alphabet words for the Ge'ez language using corpus-based methods. Because there is no wordNet or public dataset for the Ge'ez language, 1010 samples of ambiguous words were gathered. Afterwards, the words were preprocessed and the text was vectorized using bag of words, Term Frequency-Inverse Document Frequency, and word embeddings such as word2vec and fastText. The vectorized texts are then analysed using the supervised machine learning algorithms such Naive Bayes, decision trees, random forests, K-nearest neighbor, linear support vector machine, and logistic regression. Bag of words paired with random forests outperformed all other combinations, with an accuracy of 99.52%. However, when Deep learning algorithms such as Deep neural network and Long Short-Term memory were used for the same dataset, a 100% accuracy was achieved.
first_indexed 2024-04-25T01:01:37Z
format Article
id doaj.art-6aee73abb360413788986c053b0bf25b
institution Directory Open Access Journal
issn 2215-0390
language English
last_indexed 2025-03-21T15:57:39Z
publishDate 2024-06-01
publisher Elsevier
record_format Article
series Ampersand
spelling doaj.art-6aee73abb360413788986c053b0bf25b2024-06-18T04:18:03ZengElsevierAmpersand2215-03902024-06-0112100169Contextual word disambiguates of Ge'ez language with homophonic using machine learningMequanent Degu Belete0Ayodeji Olalekan Salau1Girma Kassa Alitasb2Tigist Bezabh3School of Electrical and Computer Engineering, Debre Markos Institute of Technology, Debre Markos University, Debre Markos, EthiopiaDepartment of Electrical/Electronics and Computer Engineering, Afe Babalola University, Ado-Ekiti, Nigeria; Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamil Nadu, India; Corresponding author. Department of Electrical/Electronics and Computer Engineering, Afe Babalola University, Ado-Ekiti, Nigeria.School of Electrical and Computer Engineering, Debre Markos Institute of Technology, Debre Markos University, Debre Markos, EthiopiaICT Center, Debre Markos City, East Gojjam Zone, EthiopiaAccording to natural language processing experts, there are numerous ambiguous words in languages. Without automated word meaning disambiguation for any language, the development of natural language processing technologies such as information extraction, information retrieval, machine translation, and others are still challenging task. Therfore, this paper presents the development of a word sense disambiguation model for duplicate alphabet words for the Ge'ez language using corpus-based methods. Because there is no wordNet or public dataset for the Ge'ez language, 1010 samples of ambiguous words were gathered. Afterwards, the words were preprocessed and the text was vectorized using bag of words, Term Frequency-Inverse Document Frequency, and word embeddings such as word2vec and fastText. The vectorized texts are then analysed using the supervised machine learning algorithms such Naive Bayes, decision trees, random forests, K-nearest neighbor, linear support vector machine, and logistic regression. Bag of words paired with random forests outperformed all other combinations, with an accuracy of 99.52%. However, when Deep learning algorithms such as Deep neural network and Long Short-Term memory were used for the same dataset, a 100% accuracy was achieved.http://www.sciencedirect.com/science/article/pii/S2215039024000079Ge'ez languageWSDText vectorizationMachine learning
spellingShingle Mequanent Degu Belete
Ayodeji Olalekan Salau
Girma Kassa Alitasb
Tigist Bezabh
Contextual word disambiguates of Ge'ez language with homophonic using machine learning
Ampersand
Ge'ez language
WSD
Text vectorization
Machine learning
title Contextual word disambiguates of Ge'ez language with homophonic using machine learning
title_full Contextual word disambiguates of Ge'ez language with homophonic using machine learning
title_fullStr Contextual word disambiguates of Ge'ez language with homophonic using machine learning
title_full_unstemmed Contextual word disambiguates of Ge'ez language with homophonic using machine learning
title_short Contextual word disambiguates of Ge'ez language with homophonic using machine learning
title_sort contextual word disambiguates of ge ez language with homophonic using machine learning
topic Ge'ez language
WSD
Text vectorization
Machine learning
url http://www.sciencedirect.com/science/article/pii/S2215039024000079
work_keys_str_mv AT mequanentdegubelete contextualworddisambiguatesofgeezlanguagewithhomophonicusingmachinelearning
AT ayodejiolalekansalau contextualworddisambiguatesofgeezlanguagewithhomophonicusingmachinelearning
AT girmakassaalitasb contextualworddisambiguatesofgeezlanguagewithhomophonicusingmachinelearning
AT tigistbezabh contextualworddisambiguatesofgeezlanguagewithhomophonicusingmachinelearning