Double Consistency Regularization for Transformer Networks

The large-scale and deep-layer deep neural network based on the Transformer model is very powerful in sequence tasks, but it is prone to overfitting for small-scale training data. Moreover, the prediction result of the model with a small disturbance input is significantly lower than that without dis...

Full description

Bibliographic Details
Main Authors: Yuxian Wan, Wenlin Zhang, Zhen Li
Format: Article
Language:English
Published: MDPI AG 2023-10-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/20/4357