Transfer learning and multi-phase training for accurate diacritization of Arabic poetry
Most Arabic poetry is undiacritized or partially diacritized (written without short vowels). For people of various ages and language mastery levels, diacritizing Arabic poetry would allow them to enjoy reading and chanting it easily and properly. Moreover, diacritizing a poetry verse is an essential...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-06-01
|
Series: | Journal of King Saud University: Computer and Information Sciences |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1319157822001227 |
_version_ | 1811333108073496576 |
---|---|
author | Gheith A. Abandah Ashraf E. Suyyagh Mohammad R. Abdel-Majeed |
author_facet | Gheith A. Abandah Ashraf E. Suyyagh Mohammad R. Abdel-Majeed |
author_sort | Gheith A. Abandah |
collection | DOAJ |
description | Most Arabic poetry is undiacritized or partially diacritized (written without short vowels). For people of various ages and language mastery levels, diacritizing Arabic poetry would allow them to enjoy reading and chanting it easily and properly. Moreover, diacritizing a poetry verse is an essential step to analyze it for classification and evaluation. Unfortunately, the available automatic poetry diacritization solutions are inaccurate. Diacritizing Arabic poetry is a difficult task for people and machines alike because Arabic has numerous complex diacritization rules and Arabic poetry has additional special cases and rich and vibrant compositions. Deep machine learning could provide the desired diacritization solution provided that adequate training datasets are available. Unfortunately, the available datasets are insufficient and expensive to develop. In this paper, we propose solutions to improve the automatic diacritization of Arabic poetry using deep machine learning. We mitigate the difficulty of diacritizing Arabic poetry verses by employing transfer learning to leverage pattern features from a pretrained classification model. We also overcome the training dataset deficiency by training the composite diacritization model in multiple phases on carefully selected sub-datasets. Compared with best known previous results, the proposed solutions improve the diacritization error rate from 6.08% to 3.54% (42% improvement). |
first_indexed | 2024-04-13T16:47:00Z |
format | Article |
id | doaj.art-f02cda4a97c14eb78f030226cd91851b |
institution | Directory Open Access Journal |
issn | 1319-1578 |
language | English |
last_indexed | 2024-04-13T16:47:00Z |
publishDate | 2022-06-01 |
publisher | Elsevier |
record_format | Article |
series | Journal of King Saud University: Computer and Information Sciences |
spelling | doaj.art-f02cda4a97c14eb78f030226cd91851b2022-12-22T02:39:03ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782022-06-0134637443757Transfer learning and multi-phase training for accurate diacritization of Arabic poetryGheith A. Abandah0Ashraf E. Suyyagh1Mohammad R. Abdel-Majeed2Corresponding author.; School of Engineering, The University of Jordan, Amman 11942, JordanSchool of Engineering, The University of Jordan, Amman 11942, JordanSchool of Engineering, The University of Jordan, Amman 11942, JordanMost Arabic poetry is undiacritized or partially diacritized (written without short vowels). For people of various ages and language mastery levels, diacritizing Arabic poetry would allow them to enjoy reading and chanting it easily and properly. Moreover, diacritizing a poetry verse is an essential step to analyze it for classification and evaluation. Unfortunately, the available automatic poetry diacritization solutions are inaccurate. Diacritizing Arabic poetry is a difficult task for people and machines alike because Arabic has numerous complex diacritization rules and Arabic poetry has additional special cases and rich and vibrant compositions. Deep machine learning could provide the desired diacritization solution provided that adequate training datasets are available. Unfortunately, the available datasets are insufficient and expensive to develop. In this paper, we propose solutions to improve the automatic diacritization of Arabic poetry using deep machine learning. We mitigate the difficulty of diacritizing Arabic poetry verses by employing transfer learning to leverage pattern features from a pretrained classification model. We also overcome the training dataset deficiency by training the composite diacritization model in multiple phases on carefully selected sub-datasets. Compared with best known previous results, the proposed solutions improve the diacritization error rate from 6.08% to 3.54% (42% improvement).http://www.sciencedirect.com/science/article/pii/S1319157822001227Arabic poetryAutomatic diacritizationBidirectional neural networkDeep learningTransfer learningMulti-phase training |
spellingShingle | Gheith A. Abandah Ashraf E. Suyyagh Mohammad R. Abdel-Majeed Transfer learning and multi-phase training for accurate diacritization of Arabic poetry Journal of King Saud University: Computer and Information Sciences Arabic poetry Automatic diacritization Bidirectional neural network Deep learning Transfer learning Multi-phase training |
title | Transfer learning and multi-phase training for accurate diacritization of Arabic poetry |
title_full | Transfer learning and multi-phase training for accurate diacritization of Arabic poetry |
title_fullStr | Transfer learning and multi-phase training for accurate diacritization of Arabic poetry |
title_full_unstemmed | Transfer learning and multi-phase training for accurate diacritization of Arabic poetry |
title_short | Transfer learning and multi-phase training for accurate diacritization of Arabic poetry |
title_sort | transfer learning and multi phase training for accurate diacritization of arabic poetry |
topic | Arabic poetry Automatic diacritization Bidirectional neural network Deep learning Transfer learning Multi-phase training |
url | http://www.sciencedirect.com/science/article/pii/S1319157822001227 |
work_keys_str_mv | AT gheithaabandah transferlearningandmultiphasetrainingforaccuratediacritizationofarabicpoetry AT ashrafesuyyagh transferlearningandmultiphasetrainingforaccuratediacritizationofarabicpoetry AT mohammadrabdelmajeed transferlearningandmultiphasetrainingforaccuratediacritizationofarabicpoetry |