Kurdish Kurmanji Lemmatization and Spell-checker with Spell-correction

There are many studies about using lemmatization and spell-checker with spell-correction regarding English, Arabic, and Persian languages but only few studies found regarding low-resource languages such as Kurdish language and more specifically for Kurmanji dialect, which increased the need of creat...

Full description

Bibliographic Details
Main Authors: Hanar Hoshyar Mustafa, Rebwar M. Nabi
Format: Article
Language:English
Published: University of Human Development 2023-02-01
Series:UHD Journal of Science and Technology
Subjects:
Online Access:https://journals.uhd.edu.iq/index.php/uhdjst/article/view/1076
_version_ 1827947306589093888
author Hanar Hoshyar Mustafa
Rebwar M. Nabi
author_facet Hanar Hoshyar Mustafa
Rebwar M. Nabi
author_sort Hanar Hoshyar Mustafa
collection DOAJ
description There are many studies about using lemmatization and spell-checker with spell-correction regarding English, Arabic, and Persian languages but only few studies found regarding low-resource languages such as Kurdish language and more specifically for Kurmanji dialect, which increased the need of creating such systems. Lemmatization is the process of determining a base or dictionary form (lemma) for a specific surface pattern, whereas spell-checkers and spell-correctors determine whether a word is correctly spelled also correct a range of spelling errors, respectively. This research aims to present a lemmatization and a word-level error correction system for Kurdish Kurmanji Dialect, which are the first tools for this dialect based on our knowledge. The proposed approach for lemmatization is built on morphological rules, and a hybrid approach that relies on the n-gram language model and the Jaccard Coefficient Similarity algorithm was applied to the spell-checker and spell-correction. The process results for lemmatization, as detailed in this article, rates of 97.7% and 99.3% accuracy for noun and verb lemmatization, correspondingly. Furthermore, for spell-checker and spell-correction, accordingly, accuracy rates of 100% and 90.77% are attained.
first_indexed 2024-04-09T12:37:59Z
format Article
id doaj.art-c87b4b686eb74ab0a3a3b669ff94120d
institution Directory Open Access Journal
issn 2521-4209
2521-4217
language English
last_indexed 2024-04-09T12:37:59Z
publishDate 2023-02-01
publisher University of Human Development
record_format Article
series UHD Journal of Science and Technology
spelling doaj.art-c87b4b686eb74ab0a3a3b669ff94120d2023-05-15T08:33:25ZengUniversity of Human DevelopmentUHD Journal of Science and Technology2521-42092521-42172023-02-0171435210.21928/uhdjst.v7n1y2023.pp43-521207Kurdish Kurmanji Lemmatization and Spell-checker with Spell-correctionHanar Hoshyar Mustafa0Rebwar M. Nabi1Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani, Kurdistan Region, IraqTechnical College of Informatics, Sulaimani Polytechnic University, Sulaimani, Kurdistan Region, IraqThere are many studies about using lemmatization and spell-checker with spell-correction regarding English, Arabic, and Persian languages but only few studies found regarding low-resource languages such as Kurdish language and more specifically for Kurmanji dialect, which increased the need of creating such systems. Lemmatization is the process of determining a base or dictionary form (lemma) for a specific surface pattern, whereas spell-checkers and spell-correctors determine whether a word is correctly spelled also correct a range of spelling errors, respectively. This research aims to present a lemmatization and a word-level error correction system for Kurdish Kurmanji Dialect, which are the first tools for this dialect based on our knowledge. The proposed approach for lemmatization is built on morphological rules, and a hybrid approach that relies on the n-gram language model and the Jaccard Coefficient Similarity algorithm was applied to the spell-checker and spell-correction. The process results for lemmatization, as detailed in this article, rates of 97.7% and 99.3% accuracy for noun and verb lemmatization, correspondingly. Furthermore, for spell-checker and spell-correction, accordingly, accuracy rates of 100% and 90.77% are attained.https://journals.uhd.edu.iq/index.php/uhdjst/article/view/1076kurdish languagekurmanji dialectkurdish lemmatizerkurdish spell-checker and spell-correctionkurdish dataset
spellingShingle Hanar Hoshyar Mustafa
Rebwar M. Nabi
Kurdish Kurmanji Lemmatization and Spell-checker with Spell-correction
UHD Journal of Science and Technology
kurdish language
kurmanji dialect
kurdish lemmatizer
kurdish spell-checker and spell-correction
kurdish dataset
title Kurdish Kurmanji Lemmatization and Spell-checker with Spell-correction
title_full Kurdish Kurmanji Lemmatization and Spell-checker with Spell-correction
title_fullStr Kurdish Kurmanji Lemmatization and Spell-checker with Spell-correction
title_full_unstemmed Kurdish Kurmanji Lemmatization and Spell-checker with Spell-correction
title_short Kurdish Kurmanji Lemmatization and Spell-checker with Spell-correction
title_sort kurdish kurmanji lemmatization and spell checker with spell correction
topic kurdish language
kurmanji dialect
kurdish lemmatizer
kurdish spell-checker and spell-correction
kurdish dataset
url https://journals.uhd.edu.iq/index.php/uhdjst/article/view/1076
work_keys_str_mv AT hanarhoshyarmustafa kurdishkurmanjilemmatizationandspellcheckerwithspellcorrection
AT rebwarmnabi kurdishkurmanjilemmatizationandspellcheckerwithspellcorrection