X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism

Machine translation has received significant attention in the field of natural language processing not only because of its challenges but also due to the translation needs that arise in the daily life of modern people. In this study, we design a new machine translation model named X-Transformer, whi...

Full description

Bibliographic Details
Main Authors:	Huey-Ing Liu, Wei-Lin Chen
Format:	Article
Language:	English
Published:	MDPI AG 2022-04-01
Series:	Applied Sciences
Subjects:	machine translation natural language processing
Online Access:	https://www.mdpi.com/2076-3417/12/9/4502

_version_	1797505611177394176
author	Huey-Ing Liu Wei-Lin Chen
author_facet	Huey-Ing Liu Wei-Lin Chen
author_sort	Huey-Ing Liu
collection	DOAJ
description	Machine translation has received significant attention in the field of natural language processing not only because of its challenges but also due to the translation needs that arise in the daily life of modern people. In this study, we design a new machine translation model named X-Transformer, which refines the original Transformer model regarding three aspects. First, the model parameter of the encoder is compressed. Second, the encoder structure is modified by adopting two layers of the self-attention mechanism consecutively and reducing the point-wise feed forward layer to help the model understand the semantic structure of sentences precisely. Third, we streamline the decoder model size, while maintaining the accuracy. Through experiments, we demonstrate that having a large number of decoder layers not only affects the performance of the translation model but also increases the inference time. The X-Transformer reaches the state-of-the-art result of 46.63 and 55.63 points in the BiLingual Evaluation Understudy (BLEU) metric of the World Machine Translation (WMT), from 2014, using the English–German and English–French translation corpora, thus outperforming the Transformer model with 19 and 18 BLEU points, respectively. The X-Transformer significantly reduces the training time to only 1/3 times that of the Transformer. In addition, the heat maps of the X-Transformer reach token-level precision (i.e., token-to-token attention), while the Transformer model remains at the sentence level (i.e., token-to-sentence attention).
first_indexed	2024-03-10T04:21:05Z
format	Article
id	doaj.art-93565954e640469c993f9df5399e385b
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T04:21:05Z
publishDate	2022-04-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-93565954e640469c993f9df5399e385b2023-11-23T07:49:55ZengMDPI AGApplied Sciences2076-34172022-04-01129450210.3390/app12094502X-Transformer: A Machine Translation Model Enhanced by the Self-Attention MechanismHuey-Ing Liu0Wei-Lin Chen1Electrical Engineering, Fu Jen Catholic University, No. 510 Zhongzheng Rd., Xinzhuang Dist., New Taipei City 242062, TaiwanElectrical Engineering, Fu Jen Catholic University, No. 510 Zhongzheng Rd., Xinzhuang Dist., New Taipei City 242062, TaiwanMachine translation has received significant attention in the field of natural language processing not only because of its challenges but also due to the translation needs that arise in the daily life of modern people. In this study, we design a new machine translation model named X-Transformer, which refines the original Transformer model regarding three aspects. First, the model parameter of the encoder is compressed. Second, the encoder structure is modified by adopting two layers of the self-attention mechanism consecutively and reducing the point-wise feed forward layer to help the model understand the semantic structure of sentences precisely. Third, we streamline the decoder model size, while maintaining the accuracy. Through experiments, we demonstrate that having a large number of decoder layers not only affects the performance of the translation model but also increases the inference time. The X-Transformer reaches the state-of-the-art result of 46.63 and 55.63 points in the BiLingual Evaluation Understudy (BLEU) metric of the World Machine Translation (WMT), from 2014, using the English–German and English–French translation corpora, thus outperforming the Transformer model with 19 and 18 BLEU points, respectively. The X-Transformer significantly reduces the training time to only 1/3 times that of the Transformer. In addition, the heat maps of the X-Transformer reach token-level precision (i.e., token-to-token attention), while the Transformer model remains at the sentence level (i.e., token-to-sentence attention).https://www.mdpi.com/2076-3417/12/9/4502machine translationnatural language processing
spellingShingle	Huey-Ing Liu Wei-Lin Chen X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism Applied Sciences machine translation natural language processing
title	X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism
title_full	X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism
title_fullStr	X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism
title_full_unstemmed	X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism
title_short	X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism
title_sort	x transformer a machine translation model enhanced by the self attention mechanism
topic	machine translation natural language processing
url	https://www.mdpi.com/2076-3417/12/9/4502
work_keys_str_mv	AT hueyingliu xtransformeramachinetranslationmodelenhancedbytheselfattentionmechanism AT weilinchen xtransformeramachinetranslationmodelenhancedbytheselfattentionmechanism

X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism

Similar Items