A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects
The statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM)—Conditional Random Fields (CRF) segment-level Arabic Dialect POS tagger model, wh...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2018-12-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/8/12/2502 |
_version_ | 1819088781975748608 |
---|---|
author | Laith H. Baniata Seyoung Park Seong-Bae Park |
author_facet | Laith H. Baniata Seyoung Park Seong-Bae Park |
author_sort | Laith H. Baniata |
collection | DOAJ |
description | The statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM)—Conditional Random Fields (CRF) segment-level Arabic Dialect POS tagger model, which will be integrated into the Multitask Neural Machine Translation (NMT) model. The proposed solution for NMT is based on the recurrent neural network encoder-decoder NMT model that has been introduced recently. The study has proposed and developed a unified Multitask NMT model that shares an encoder between the two tasks; Arabic Dialect (AD) to Modern Standard Arabic (MSA) translation task and the segment-level POS tagging tasks. A shared layer and an invariant layer are shared between the translation tasks. By training translation tasks and POS tagging task alternately, the proposed model can leverage the characteristic information and improve the translation quality from Arabic dialects to Modern Standard Arabic. The experiments are conducted from Levantine Arabic (LA) to MSA and Maghrebi Arabic (MA) to MSA translation tasks. As an additional linguistic resource, the segment-level part-of-speech tags for Arabic dialects were also exploited. Experiments suggest that translation quality and the performance of POS tagger were improved with the implementation of multitask learning approach. |
first_indexed | 2024-12-21T21:57:30Z |
format | Article |
id | doaj.art-6be72491772a43e799c47ae8611a6e50 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-12-21T21:57:30Z |
publishDate | 2018-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-6be72491772a43e799c47ae8611a6e502022-12-21T18:48:55ZengMDPI AGApplied Sciences2076-34172018-12-01812250210.3390/app8122502app8122502A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic DialectsLaith H. Baniata0Seyoung Park1Seong-Bae Park2School of Computer Science and Engineering, Kyungpook National University, 80 Daehakro, Buk-gu, Daegu 41566, KoreaSchool of Computer Science and Engineering, Kyungpook National University, 80 Daehakro, Buk-gu, Daegu 41566, KoreaDepartment of Computer Science and Engineering, Kyung Hee University, 1732 Deogyeong-daero, Giheung-gu, Yongin 17104, KoreaThe statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM)—Conditional Random Fields (CRF) segment-level Arabic Dialect POS tagger model, which will be integrated into the Multitask Neural Machine Translation (NMT) model. The proposed solution for NMT is based on the recurrent neural network encoder-decoder NMT model that has been introduced recently. The study has proposed and developed a unified Multitask NMT model that shares an encoder between the two tasks; Arabic Dialect (AD) to Modern Standard Arabic (MSA) translation task and the segment-level POS tagging tasks. A shared layer and an invariant layer are shared between the translation tasks. By training translation tasks and POS tagging task alternately, the proposed model can leverage the characteristic information and improve the translation quality from Arabic dialects to Modern Standard Arabic. The experiments are conducted from Levantine Arabic (LA) to MSA and Maghrebi Arabic (MA) to MSA translation tasks. As an additional linguistic resource, the segment-level part-of-speech tags for Arabic dialects were also exploited. Experiments suggest that translation quality and the performance of POS tagger were improved with the implementation of multitask learning approach.https://www.mdpi.com/2076-3417/8/12/2502NMTMTLPOS taggingCRFBi-LSTMArabic dialectsencoderdecoderMSA |
spellingShingle | Laith H. Baniata Seyoung Park Seong-Bae Park A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects Applied Sciences NMT MTL POS tagging CRF Bi-LSTM Arabic dialects encoder decoder MSA |
title | A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects |
title_full | A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects |
title_fullStr | A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects |
title_full_unstemmed | A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects |
title_short | A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects |
title_sort | multitask based neural machine translation model with part of speech tags integration for arabic dialects |
topic | NMT MTL POS tagging CRF Bi-LSTM Arabic dialects encoder decoder MSA |
url | https://www.mdpi.com/2076-3417/8/12/2502 |
work_keys_str_mv | AT laithhbaniata amultitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects AT seyoungpark amultitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects AT seongbaepark amultitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects AT laithhbaniata multitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects AT seyoungpark multitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects AT seongbaepark multitaskbasedneuralmachinetranslationmodelwithpartofspeechtagsintegrationforarabicdialects |