Double Consistency Regularization for Transformer Networks
The large-scale and deep-layer deep neural network based on the Transformer model is very powerful in sequence tasks, but it is prone to overfitting for small-scale training data. Moreover, the prediction result of the model with a small disturbance input is significantly lower than that without dis...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-10-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/20/4357 |
_version_ | 1797573953667989504 |
---|---|
author | Yuxian Wan Wenlin Zhang Zhen Li |
author_facet | Yuxian Wan Wenlin Zhang Zhen Li |
author_sort | Yuxian Wan |
collection | DOAJ |
description | The large-scale and deep-layer deep neural network based on the Transformer model is very powerful in sequence tasks, but it is prone to overfitting for small-scale training data. Moreover, the prediction result of the model with a small disturbance input is significantly lower than that without disturbance. In this work, we propose a double consistency regularization (DOCR) method for the end-to-end model structure, which separately constrains the output of the encoder and decoder during the training process to alleviate the above problems. Specifically, on the basis of the cross-entropy loss function, we build the mean model by integrating the model parameters of the previous rounds and measure the consistency between the models by calculating the KL divergence between the features of the encoder output and the probability distribution of the decoder output of the mean model and the base model so as to impose regularization constraints on the solution space of the model. We conducted extensive experiments on machine translation tasks, and the results show that the BLEU score increased by 2.60 on average, demonstrating the effectiveness of DOCR in improving model performance and its complementary impacts with other regularization techniques. |
first_indexed | 2024-03-10T21:17:19Z |
format | Article |
id | doaj.art-2c9c72bd06e445cf9cb52aa5beb7e921 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T21:17:19Z |
publishDate | 2023-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-2c9c72bd06e445cf9cb52aa5beb7e9212023-11-19T16:20:36ZengMDPI AGElectronics2079-92922023-10-011220435710.3390/electronics12204357Double Consistency Regularization for Transformer NetworksYuxian Wan0Wenlin Zhang1Zhen Li2School of Information System Engineering, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, ChinaSchool of Information System Engineering, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, ChinaSchool of Information System Engineering, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, ChinaThe large-scale and deep-layer deep neural network based on the Transformer model is very powerful in sequence tasks, but it is prone to overfitting for small-scale training data. Moreover, the prediction result of the model with a small disturbance input is significantly lower than that without disturbance. In this work, we propose a double consistency regularization (DOCR) method for the end-to-end model structure, which separately constrains the output of the encoder and decoder during the training process to alleviate the above problems. Specifically, on the basis of the cross-entropy loss function, we build the mean model by integrating the model parameters of the previous rounds and measure the consistency between the models by calculating the KL divergence between the features of the encoder output and the probability distribution of the decoder output of the mean model and the base model so as to impose regularization constraints on the solution space of the model. We conducted extensive experiments on machine translation tasks, and the results show that the BLEU score increased by 2.60 on average, demonstrating the effectiveness of DOCR in improving model performance and its complementary impacts with other regularization techniques.https://www.mdpi.com/2079-9292/12/20/4357cross-entropy lossdeep neural networkKL divergenceoverfittingtransformerregularization |
spellingShingle | Yuxian Wan Wenlin Zhang Zhen Li Double Consistency Regularization for Transformer Networks Electronics cross-entropy loss deep neural network KL divergence overfitting transformer regularization |
title | Double Consistency Regularization for Transformer Networks |
title_full | Double Consistency Regularization for Transformer Networks |
title_fullStr | Double Consistency Regularization for Transformer Networks |
title_full_unstemmed | Double Consistency Regularization for Transformer Networks |
title_short | Double Consistency Regularization for Transformer Networks |
title_sort | double consistency regularization for transformer networks |
topic | cross-entropy loss deep neural network KL divergence overfitting transformer regularization |
url | https://www.mdpi.com/2079-9292/12/20/4357 |
work_keys_str_mv | AT yuxianwan doubleconsistencyregularizationfortransformernetworks AT wenlinzhang doubleconsistencyregularizationfortransformernetworks AT zhenli doubleconsistencyregularizationfortransformernetworks |