Enhancing the Generalization for Text Classification through Fusion of Backward Features

Generalization has always been a keyword in deep learning. Pretrained models and domain adaptation technology have received widespread attention in solving the problem of generalization. They are all focused on finding features in data to improve the generalization ability and to prevent overfitting...

Full description

Bibliographic Details
Main Authors: Dewen Seng, Xin Wu
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/3/1287
_version_ 1827759503236399104
author Dewen Seng
Xin Wu
author_facet Dewen Seng
Xin Wu
author_sort Dewen Seng
collection DOAJ
description Generalization has always been a keyword in deep learning. Pretrained models and domain adaptation technology have received widespread attention in solving the problem of generalization. They are all focused on finding features in data to improve the generalization ability and to prevent overfitting. Although they have achieved good results in various tasks, those models are unstable when classifying a sentence whose label is positive but still contains negative phrases. In this article, we analyzed the attention heat map of the benchmarks and found that previous models pay more attention to the phrase rather than to the semantic information of the whole sentence. Moreover, we proposed a method to scatter the attention away from opposite sentiment words to avoid a one-sided judgment. We designed a two-stream network and stacked the gradient reversal layer and feature projection layer within the auxiliary network. The gradient reversal layer can reverse the gradient of features in the training stage so that the parameters are optimized following the reversed gradient in the backpropagation stage. We utilized an auxiliary network to extract the backward features and then fed them into the main network to merge them with normal features extracted by the main network. We applied this method to the three baselines of TextCNN, BERT, and RoBERTa using sentiment analysis and sarcasm detection datasets. The results show that our method can improve the sentiment analysis datasets by 0.5% and the sarcasm detection datasets by 2.1%.
first_indexed 2024-03-11T09:26:22Z
format Article
id doaj.art-aa118c2a59f344d1a14a3a96b74314dd
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-11T09:26:22Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-aa118c2a59f344d1a14a3a96b74314dd2023-11-16T17:58:52ZengMDPI AGSensors1424-82202023-01-01233128710.3390/s23031287Enhancing the Generalization for Text Classification through Fusion of Backward FeaturesDewen Seng0Xin Wu1School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310005, ChinaSchool of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310005, ChinaGeneralization has always been a keyword in deep learning. Pretrained models and domain adaptation technology have received widespread attention in solving the problem of generalization. They are all focused on finding features in data to improve the generalization ability and to prevent overfitting. Although they have achieved good results in various tasks, those models are unstable when classifying a sentence whose label is positive but still contains negative phrases. In this article, we analyzed the attention heat map of the benchmarks and found that previous models pay more attention to the phrase rather than to the semantic information of the whole sentence. Moreover, we proposed a method to scatter the attention away from opposite sentiment words to avoid a one-sided judgment. We designed a two-stream network and stacked the gradient reversal layer and feature projection layer within the auxiliary network. The gradient reversal layer can reverse the gradient of features in the training stage so that the parameters are optimized following the reversed gradient in the backpropagation stage. We utilized an auxiliary network to extract the backward features and then fed them into the main network to merge them with normal features extracted by the main network. We applied this method to the three baselines of TextCNN, BERT, and RoBERTa using sentiment analysis and sarcasm detection datasets. The results show that our method can improve the sentiment analysis datasets by 0.5% and the sarcasm detection datasets by 2.1%.https://www.mdpi.com/1424-8220/23/3/1287deep learningtext classificationtwo-stream networksfeature fusionsentiment classificationsarcasm detection
spellingShingle Dewen Seng
Xin Wu
Enhancing the Generalization for Text Classification through Fusion of Backward Features
Sensors
deep learning
text classification
two-stream networks
feature fusion
sentiment classification
sarcasm detection
title Enhancing the Generalization for Text Classification through Fusion of Backward Features
title_full Enhancing the Generalization for Text Classification through Fusion of Backward Features
title_fullStr Enhancing the Generalization for Text Classification through Fusion of Backward Features
title_full_unstemmed Enhancing the Generalization for Text Classification through Fusion of Backward Features
title_short Enhancing the Generalization for Text Classification through Fusion of Backward Features
title_sort enhancing the generalization for text classification through fusion of backward features
topic deep learning
text classification
two-stream networks
feature fusion
sentiment classification
sarcasm detection
url https://www.mdpi.com/1424-8220/23/3/1287
work_keys_str_mv AT dewenseng enhancingthegeneralizationfortextclassificationthroughfusionofbackwardfeatures
AT xinwu enhancingthegeneralizationfortextclassificationthroughfusionofbackwardfeatures