Transfer learning and multi-phase training for accurate diacritization of Arabic poetry

Most Arabic poetry is undiacritized or partially diacritized (written without short vowels). For people of various ages and language mastery levels, diacritizing Arabic poetry would allow them to enjoy reading and chanting it easily and properly. Moreover, diacritizing a poetry verse is an essential...

Full description

Bibliographic Details
Main Authors: Gheith A. Abandah, Ashraf E. Suyyagh, Mohammad R. Abdel-Majeed
Format: Article
Language:English
Published: Elsevier 2022-06-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157822001227
_version_ 1811333108073496576
author Gheith A. Abandah
Ashraf E. Suyyagh
Mohammad R. Abdel-Majeed
author_facet Gheith A. Abandah
Ashraf E. Suyyagh
Mohammad R. Abdel-Majeed
author_sort Gheith A. Abandah
collection DOAJ
description Most Arabic poetry is undiacritized or partially diacritized (written without short vowels). For people of various ages and language mastery levels, diacritizing Arabic poetry would allow them to enjoy reading and chanting it easily and properly. Moreover, diacritizing a poetry verse is an essential step to analyze it for classification and evaluation. Unfortunately, the available automatic poetry diacritization solutions are inaccurate. Diacritizing Arabic poetry is a difficult task for people and machines alike because Arabic has numerous complex diacritization rules and Arabic poetry has additional special cases and rich and vibrant compositions. Deep machine learning could provide the desired diacritization solution provided that adequate training datasets are available. Unfortunately, the available datasets are insufficient and expensive to develop. In this paper, we propose solutions to improve the automatic diacritization of Arabic poetry using deep machine learning. We mitigate the difficulty of diacritizing Arabic poetry verses by employing transfer learning to leverage pattern features from a pretrained classification model. We also overcome the training dataset deficiency by training the composite diacritization model in multiple phases on carefully selected sub-datasets. Compared with best known previous results, the proposed solutions improve the diacritization error rate from 6.08% to 3.54% (42% improvement).
first_indexed 2024-04-13T16:47:00Z
format Article
id doaj.art-f02cda4a97c14eb78f030226cd91851b
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-04-13T16:47:00Z
publishDate 2022-06-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-f02cda4a97c14eb78f030226cd91851b2022-12-22T02:39:03ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782022-06-0134637443757Transfer learning and multi-phase training for accurate diacritization of Arabic poetryGheith A. Abandah0Ashraf E. Suyyagh1Mohammad R. Abdel-Majeed2Corresponding author.; School of Engineering, The University of Jordan, Amman 11942, JordanSchool of Engineering, The University of Jordan, Amman 11942, JordanSchool of Engineering, The University of Jordan, Amman 11942, JordanMost Arabic poetry is undiacritized or partially diacritized (written without short vowels). For people of various ages and language mastery levels, diacritizing Arabic poetry would allow them to enjoy reading and chanting it easily and properly. Moreover, diacritizing a poetry verse is an essential step to analyze it for classification and evaluation. Unfortunately, the available automatic poetry diacritization solutions are inaccurate. Diacritizing Arabic poetry is a difficult task for people and machines alike because Arabic has numerous complex diacritization rules and Arabic poetry has additional special cases and rich and vibrant compositions. Deep machine learning could provide the desired diacritization solution provided that adequate training datasets are available. Unfortunately, the available datasets are insufficient and expensive to develop. In this paper, we propose solutions to improve the automatic diacritization of Arabic poetry using deep machine learning. We mitigate the difficulty of diacritizing Arabic poetry verses by employing transfer learning to leverage pattern features from a pretrained classification model. We also overcome the training dataset deficiency by training the composite diacritization model in multiple phases on carefully selected sub-datasets. Compared with best known previous results, the proposed solutions improve the diacritization error rate from 6.08% to 3.54% (42% improvement).http://www.sciencedirect.com/science/article/pii/S1319157822001227Arabic poetryAutomatic diacritizationBidirectional neural networkDeep learningTransfer learningMulti-phase training
spellingShingle Gheith A. Abandah
Ashraf E. Suyyagh
Mohammad R. Abdel-Majeed
Transfer learning and multi-phase training for accurate diacritization of Arabic poetry
Journal of King Saud University: Computer and Information Sciences
Arabic poetry
Automatic diacritization
Bidirectional neural network
Deep learning
Transfer learning
Multi-phase training
title Transfer learning and multi-phase training for accurate diacritization of Arabic poetry
title_full Transfer learning and multi-phase training for accurate diacritization of Arabic poetry
title_fullStr Transfer learning and multi-phase training for accurate diacritization of Arabic poetry
title_full_unstemmed Transfer learning and multi-phase training for accurate diacritization of Arabic poetry
title_short Transfer learning and multi-phase training for accurate diacritization of Arabic poetry
title_sort transfer learning and multi phase training for accurate diacritization of arabic poetry
topic Arabic poetry
Automatic diacritization
Bidirectional neural network
Deep learning
Transfer learning
Multi-phase training
url http://www.sciencedirect.com/science/article/pii/S1319157822001227
work_keys_str_mv AT gheithaabandah transferlearningandmultiphasetrainingforaccuratediacritizationofarabicpoetry
AT ashrafesuyyagh transferlearningandmultiphasetrainingforaccuratediacritizationofarabicpoetry
AT mohammadrabdelmajeed transferlearningandmultiphasetrainingforaccuratediacritizationofarabicpoetry