A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects

The statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM)—Conditional Random Fields (CRF) segment-level Arabic Dialect POS tagger model, wh...

Full description

Bibliographic Details
Main Authors: Laith H. Baniata, Seyoung Park, Seong-Bae Park
Format: Article
Language:English
Published: MDPI AG 2018-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/8/12/2502
_version_ 1819088781975748608
author Laith H. Baniata
Seyoung Park
Seong-Bae Park
author_facet Laith H. Baniata
Seyoung Park
Seong-Bae Park
author_sort Laith H. Baniata
collection DOAJ
description The statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM)—Conditional Random Fields (CRF) segment-level Arabic Dialect POS tagger model, which will be integrated into the Multitask Neural Machine Translation (NMT) model. The proposed solution for NMT is based on the recurrent neural network encoder-decoder NMT model that has been introduced recently. The study has proposed and developed a unified Multitask NMT model that shares an encoder between the two tasks; Arabic Dialect (AD) to Modern Standard Arabic (MSA) translation task and the segment-level POS tagging tasks. A shared layer and an invariant layer are shared between the translation tasks. By training translation tasks and POS tagging task alternately, the proposed model can leverage the characteristic information and improve the translation quality from Arabic dialects to Modern Standard Arabic. The experiments are conducted from Levantine Arabic (LA) to MSA and Maghrebi Arabic (MA) to MSA translation tasks. As an additional linguistic resource, the segment-level part-of-speech tags for Arabic dialects were also exploited. Experiments suggest that translation quality and the performance of POS tagger were improved with the implementation of multitask learning approach.
first_indexed 2024-12-21T21:57:30Z
format Article
id doaj.art-6be72491772a43e799c47ae8611a6e50
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-12-21T21:57:30Z
publishDate 2018-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-6be72491772a43e799c47ae8611a6e502022-12-21T18:48:55ZengMDPI AGApplied Sciences2076-34172018-12-01812250210.3390/app8122502app8122502A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic DialectsLaith H. Baniata0Seyoung Park1Seong-Bae Park2School of Computer Science and Engineering, Kyungpook National University, 80 Daehakro, Buk-gu, Daegu 41566, KoreaSchool of Computer Science and Engineering, Kyungpook National University, 80 Daehakro, Buk-gu, Daegu 41566, KoreaDepartment of Computer Science and Engineering, Kyung Hee University, 1732 Deogyeong-daero, Giheung-gu, Yongin 17104, KoreaThe statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM)—Conditional Random Fields (CRF) segment-level Arabic Dialect POS tagger model, which will be integrated into the Multitask Neural Machine Translation (NMT) model. The proposed solution for NMT is based on the recurrent neural network encoder-decoder NMT model that has been introduced recently. The study has proposed and developed a unified Multitask NMT model that shares an encoder between the two tasks; Arabic Dialect (AD) to Modern Standard Arabic (MSA) translation task and the segment-level POS tagging tasks. A shared layer and an invariant layer are shared between the translation tasks. By training translation tasks and POS tagging task alternately, the proposed model can leverage the characteristic information and improve the translation quality from Arabic dialects to Modern Standard Arabic. The experiments are conducted from Levantine Arabic (LA) to MSA and Maghrebi Arabic (MA) to MSA translation tasks. As an additional linguistic resource, the segment-level part-of-speech tags for Arabic dialects were also exploited. Experiments suggest that translation quality and the performance of POS tagger were improved with the implementation of multitask learning approach.https://www.mdpi.com/2076-3417/8/12/2502NMTMTLPOS taggingCRFBi-LSTMArabic dialectsencoderdecoderMSA
spellingShingle Laith H. Baniata
Seyoung Park
Seong-Bae Park
A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects
Applied Sciences
NMT
MTL
POS tagging
CRF
Bi-LSTM
Arabic dialects
encoder
decoder
MSA
title A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects
title_full A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects
title_fullStr A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects
title_full_unstemmed A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects
title_short A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects
title_sort multitask based neural machine translation model with part of speech tags integration for arabic dialects
topic NMT
MTL
POS tagging
CRF
Bi-LSTM
Arabic dialects
encoder
decoder
MSA
url https://www.mdpi.com/2076-3417/8/12/2502
work_keys_str_mv AT laithhbaniata amultitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects
AT seyoungpark amultitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects
AT seongbaepark amultitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects
AT laithhbaniata multitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects
AT seyoungpark multitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects
AT seongbaepark multitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects