Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data
Despite the many proposals to solve the neural machine translation (NMT) problem of low-resource languages, it continues to be difficult. The issue becomes even more complicated when few resources cover only a single domain. In this paper, we discuss the applicability of a source-side monolingual da...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-01-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/2/1201 |
_version_ | 1797446450019303424 |
---|---|
author | Atnafu Lambebo Tonja Olga Kolesnikova Alexander Gelbukh Grigori Sidorov |
author_facet | Atnafu Lambebo Tonja Olga Kolesnikova Alexander Gelbukh Grigori Sidorov |
author_sort | Atnafu Lambebo Tonja |
collection | DOAJ |
description | Despite the many proposals to solve the neural machine translation (NMT) problem of low-resource languages, it continues to be difficult. The issue becomes even more complicated when few resources cover only a single domain. In this paper, we discuss the applicability of a source-side monolingual dataset of low-resource languages to improve the NMT system for such languages. In our experiments, we used Wolaytta–English translation as a low-resource language. We discuss the use of self-learning and fine-tuning approaches to improve the NMT system for Wolaytta–English translation using both authentic and synthetic datasets. The self-learning approach showed +2.7 and +2.4 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively, over the best-performing baseline model. Further fine-tuning the best-performing self-learning model showed +1.2 and +0.6 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively. We reflect on our contributions and plan for the future of this difficult field of study. |
first_indexed | 2024-03-09T13:40:41Z |
format | Article |
id | doaj.art-c1e6be9241ce46189c902baa838d2f2a |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-09T13:40:41Z |
publishDate | 2023-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-c1e6be9241ce46189c902baa838d2f2a2023-11-30T21:07:37ZengMDPI AGApplied Sciences2076-34172023-01-01132120110.3390/app13021201Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual DataAtnafu Lambebo Tonja0Olga Kolesnikova1Alexander Gelbukh2Grigori Sidorov3Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), Mexico City 07738, MexicoInstituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), Mexico City 07738, MexicoInstituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), Mexico City 07738, MexicoInstituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), Mexico City 07738, MexicoDespite the many proposals to solve the neural machine translation (NMT) problem of low-resource languages, it continues to be difficult. The issue becomes even more complicated when few resources cover only a single domain. In this paper, we discuss the applicability of a source-side monolingual dataset of low-resource languages to improve the NMT system for such languages. In our experiments, we used Wolaytta–English translation as a low-resource language. We discuss the use of self-learning and fine-tuning approaches to improve the NMT system for Wolaytta–English translation using both authentic and synthetic datasets. The self-learning approach showed +2.7 and +2.4 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively, over the best-performing baseline model. Further fine-tuning the best-performing self-learning model showed +1.2 and +0.6 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively. We reflect on our contributions and plan for the future of this difficult field of study.https://www.mdpi.com/2076-3417/13/2/1201Wolaytta–English NMTEnglish–Wolaytta NMTlow-resource NMTself-learningneural machine translationmonolingual data for low-resource languages |
spellingShingle | Atnafu Lambebo Tonja Olga Kolesnikova Alexander Gelbukh Grigori Sidorov Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data Applied Sciences Wolaytta–English NMT English–Wolaytta NMT low-resource NMT self-learning neural machine translation monolingual data for low-resource languages |
title | Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data |
title_full | Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data |
title_fullStr | Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data |
title_full_unstemmed | Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data |
title_short | Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data |
title_sort | low resource neural machine translation improvement using source side monolingual data |
topic | Wolaytta–English NMT English–Wolaytta NMT low-resource NMT self-learning neural machine translation monolingual data for low-resource languages |
url | https://www.mdpi.com/2076-3417/13/2/1201 |
work_keys_str_mv | AT atnafulambebotonja lowresourceneuralmachinetranslationimprovementusingsourcesidemonolingualdata AT olgakolesnikova lowresourceneuralmachinetranslationimprovementusingsourcesidemonolingualdata AT alexandergelbukh lowresourceneuralmachinetranslationimprovementusingsourcesidemonolingualdata AT grigorisidorov lowresourceneuralmachinetranslationimprovementusingsourcesidemonolingualdata |