Normalization of Arabic Dialects into Modern Standard Arabic using BERT and GPT-2

We present an encoder-decored based model for normalization of Arabic dialects using both BERT and GPT-2 based models. Arabic is a language of many dialects that not only differ from the Modern Standard Arabic (MSA) in terms of pronunciation but also in terms of morphology, grammar and lexical choic...

Full description

Bibliographic Details
Main Authors: Khalid Alnajjar, Mika Hämäläinen
Format: Article
Language:English
Published: Nicolas Turenne 2024-04-01
Series:Journal of Data Mining and Digital Humanities
Subjects:
Online Access:https://jdmdh.episciences.org/13146/pdf