Impact of Sentence Representation Matching in Neural Machine Translation
Most neural machine translation models are implemented as a conditional language model framework composed of encoder and decoder models. This framework learns complex and long-distant dependencies, but its deep structure causes inefficiency in training. Matching vector representations of source and...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-01-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/3/1313 |
_version_ | 1797489214122622976 |
---|---|
author | Heeseung Jung Kangil Kim Jong-Hun Shin Seung-Hoon Na Sangkeun Jung Sangmin Woo |
author_facet | Heeseung Jung Kangil Kim Jong-Hun Shin Seung-Hoon Na Sangkeun Jung Sangmin Woo |
author_sort | Heeseung Jung |
collection | DOAJ |
description | Most neural machine translation models are implemented as a conditional language model framework composed of encoder and decoder models. This framework learns complex and long-distant dependencies, but its deep structure causes inefficiency in training. Matching vector representations of source and target sentences improves the inefficiency by shortening the depth from parameters to costs and generalizes NMTs with a different perspective to cross-entropy loss. In this paper, we propose matching methods to derive the cost based on constant word-embedding vectors of source and target sentences. To find the best method, we analyze the impact of the methods with varying structures, distance metrics, and model capacity in a French to English translation task. An optimally configured method is applied to English translation tasks from and to French, Spanish, and German. In the tasks, the method showed performance improvement by 3.23 BLEU at maximum, with an improvement of 0.71 on average. We evaluated the robustness of this method to various embedding distributions and models, such as conventional gated structures and transformer networks, and empirical results showed that it has a higher chance to improve performance in those models. |
first_indexed | 2024-03-10T00:14:17Z |
format | Article |
id | doaj.art-a840124518574967a7c46dd784f8628f |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T00:14:17Z |
publishDate | 2022-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-a840124518574967a7c46dd784f8628f2023-11-23T15:55:31ZengMDPI AGApplied Sciences2076-34172022-01-01123131310.3390/app12031313Impact of Sentence Representation Matching in Neural Machine TranslationHeeseung Jung0Kangil Kim1Jong-Hun Shin2Seung-Hoon Na3Sangkeun Jung4Sangmin Woo5Gwangju Institute of Science and Technology (GIST), Gwangju 61005, KoreaGwangju Institute of Science and Technology (GIST), Gwangju 61005, KoreaElectronics and Telecommunications Research Institute (ETRI), Gwangju 61012, KoreaDepartment of Computer Science, Jeonbuk National University, Jeonju-si 54896, KoreaComputer Science and Engineering, Chungnam National University, Daejeon 34134, KoreaKorea Advanced Institute of Science and Technology, Daejeon 34141, KoreaMost neural machine translation models are implemented as a conditional language model framework composed of encoder and decoder models. This framework learns complex and long-distant dependencies, but its deep structure causes inefficiency in training. Matching vector representations of source and target sentences improves the inefficiency by shortening the depth from parameters to costs and generalizes NMTs with a different perspective to cross-entropy loss. In this paper, we propose matching methods to derive the cost based on constant word-embedding vectors of source and target sentences. To find the best method, we analyze the impact of the methods with varying structures, distance metrics, and model capacity in a French to English translation task. An optimally configured method is applied to English translation tasks from and to French, Spanish, and German. In the tasks, the method showed performance improvement by 3.23 BLEU at maximum, with an improvement of 0.71 on average. We evaluated the robustness of this method to various embedding distributions and models, such as conventional gated structures and transformer networks, and empirical results showed that it has a higher chance to improve performance in those models.https://www.mdpi.com/2076-3417/12/3/1313recurrent neural networkmachine translationsimilaritysentence representationguiding pressure |
spellingShingle | Heeseung Jung Kangil Kim Jong-Hun Shin Seung-Hoon Na Sangkeun Jung Sangmin Woo Impact of Sentence Representation Matching in Neural Machine Translation Applied Sciences recurrent neural network machine translation similarity sentence representation guiding pressure |
title | Impact of Sentence Representation Matching in Neural Machine Translation |
title_full | Impact of Sentence Representation Matching in Neural Machine Translation |
title_fullStr | Impact of Sentence Representation Matching in Neural Machine Translation |
title_full_unstemmed | Impact of Sentence Representation Matching in Neural Machine Translation |
title_short | Impact of Sentence Representation Matching in Neural Machine Translation |
title_sort | impact of sentence representation matching in neural machine translation |
topic | recurrent neural network machine translation similarity sentence representation guiding pressure |
url | https://www.mdpi.com/2076-3417/12/3/1313 |
work_keys_str_mv | AT heeseungjung impactofsentencerepresentationmatchinginneuralmachinetranslation AT kangilkim impactofsentencerepresentationmatchinginneuralmachinetranslation AT jonghunshin impactofsentencerepresentationmatchinginneuralmachinetranslation AT seunghoonna impactofsentencerepresentationmatchinginneuralmachinetranslation AT sangkeunjung impactofsentencerepresentationmatchinginneuralmachinetranslation AT sangminwoo impactofsentencerepresentationmatchinginneuralmachinetranslation |