Contextual word disambiguates of Ge'ez language with homophonic using machine learning
According to natural language processing experts, there are numerous ambiguous words in languages. Without automated word meaning disambiguation for any language, the development of natural language processing technologies such as information extraction, information retrieval, machine translation, a...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024-06-01
|
Series: | Ampersand |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2215039024000079 |
_version_ | 1827219869339222016 |
---|---|
author | Mequanent Degu Belete Ayodeji Olalekan Salau Girma Kassa Alitasb Tigist Bezabh |
author_facet | Mequanent Degu Belete Ayodeji Olalekan Salau Girma Kassa Alitasb Tigist Bezabh |
author_sort | Mequanent Degu Belete |
collection | DOAJ |
description | According to natural language processing experts, there are numerous ambiguous words in languages. Without automated word meaning disambiguation for any language, the development of natural language processing technologies such as information extraction, information retrieval, machine translation, and others are still challenging task. Therfore, this paper presents the development of a word sense disambiguation model for duplicate alphabet words for the Ge'ez language using corpus-based methods. Because there is no wordNet or public dataset for the Ge'ez language, 1010 samples of ambiguous words were gathered. Afterwards, the words were preprocessed and the text was vectorized using bag of words, Term Frequency-Inverse Document Frequency, and word embeddings such as word2vec and fastText. The vectorized texts are then analysed using the supervised machine learning algorithms such Naive Bayes, decision trees, random forests, K-nearest neighbor, linear support vector machine, and logistic regression. Bag of words paired with random forests outperformed all other combinations, with an accuracy of 99.52%. However, when Deep learning algorithms such as Deep neural network and Long Short-Term memory were used for the same dataset, a 100% accuracy was achieved. |
first_indexed | 2024-04-25T01:01:37Z |
format | Article |
id | doaj.art-6aee73abb360413788986c053b0bf25b |
institution | Directory Open Access Journal |
issn | 2215-0390 |
language | English |
last_indexed | 2025-03-21T15:57:39Z |
publishDate | 2024-06-01 |
publisher | Elsevier |
record_format | Article |
series | Ampersand |
spelling | doaj.art-6aee73abb360413788986c053b0bf25b2024-06-18T04:18:03ZengElsevierAmpersand2215-03902024-06-0112100169Contextual word disambiguates of Ge'ez language with homophonic using machine learningMequanent Degu Belete0Ayodeji Olalekan Salau1Girma Kassa Alitasb2Tigist Bezabh3School of Electrical and Computer Engineering, Debre Markos Institute of Technology, Debre Markos University, Debre Markos, EthiopiaDepartment of Electrical/Electronics and Computer Engineering, Afe Babalola University, Ado-Ekiti, Nigeria; Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamil Nadu, India; Corresponding author. Department of Electrical/Electronics and Computer Engineering, Afe Babalola University, Ado-Ekiti, Nigeria.School of Electrical and Computer Engineering, Debre Markos Institute of Technology, Debre Markos University, Debre Markos, EthiopiaICT Center, Debre Markos City, East Gojjam Zone, EthiopiaAccording to natural language processing experts, there are numerous ambiguous words in languages. Without automated word meaning disambiguation for any language, the development of natural language processing technologies such as information extraction, information retrieval, machine translation, and others are still challenging task. Therfore, this paper presents the development of a word sense disambiguation model for duplicate alphabet words for the Ge'ez language using corpus-based methods. Because there is no wordNet or public dataset for the Ge'ez language, 1010 samples of ambiguous words were gathered. Afterwards, the words were preprocessed and the text was vectorized using bag of words, Term Frequency-Inverse Document Frequency, and word embeddings such as word2vec and fastText. The vectorized texts are then analysed using the supervised machine learning algorithms such Naive Bayes, decision trees, random forests, K-nearest neighbor, linear support vector machine, and logistic regression. Bag of words paired with random forests outperformed all other combinations, with an accuracy of 99.52%. However, when Deep learning algorithms such as Deep neural network and Long Short-Term memory were used for the same dataset, a 100% accuracy was achieved.http://www.sciencedirect.com/science/article/pii/S2215039024000079Ge'ez languageWSDText vectorizationMachine learning |
spellingShingle | Mequanent Degu Belete Ayodeji Olalekan Salau Girma Kassa Alitasb Tigist Bezabh Contextual word disambiguates of Ge'ez language with homophonic using machine learning Ampersand Ge'ez language WSD Text vectorization Machine learning |
title | Contextual word disambiguates of Ge'ez language with homophonic using machine learning |
title_full | Contextual word disambiguates of Ge'ez language with homophonic using machine learning |
title_fullStr | Contextual word disambiguates of Ge'ez language with homophonic using machine learning |
title_full_unstemmed | Contextual word disambiguates of Ge'ez language with homophonic using machine learning |
title_short | Contextual word disambiguates of Ge'ez language with homophonic using machine learning |
title_sort | contextual word disambiguates of ge ez language with homophonic using machine learning |
topic | Ge'ez language WSD Text vectorization Machine learning |
url | http://www.sciencedirect.com/science/article/pii/S2215039024000079 |
work_keys_str_mv | AT mequanentdegubelete contextualworddisambiguatesofgeezlanguagewithhomophonicusingmachinelearning AT ayodejiolalekansalau contextualworddisambiguatesofgeezlanguagewithhomophonicusingmachinelearning AT girmakassaalitasb contextualworddisambiguatesofgeezlanguagewithhomophonicusingmachinelearning AT tigistbezabh contextualworddisambiguatesofgeezlanguagewithhomophonicusingmachinelearning |