Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion

The goal of multimodal sentiment analysis is to achieve reliable and robust sentiment analysis by utilizing complementary information provided by multiple modalities.Recently,extracting deep semantic features by neural networks has achieved remarkable results in multimodal sentiment analysis.But the...

Full description

Bibliographic Details
Main Author: CHEN Zhen, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua
Format: Article
Language:zho
Published: Editorial office of Computer Science 2023-03-01
Series:Jisuanji kexue
Subjects:
Online Access:https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2023-50-3-298.pdf
_version_ 1797845083419049984
author CHEN Zhen, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua
author_facet CHEN Zhen, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua
author_sort CHEN Zhen, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua
collection DOAJ
description The goal of multimodal sentiment analysis is to achieve reliable and robust sentiment analysis by utilizing complementary information provided by multiple modalities.Recently,extracting deep semantic features by neural networks has achieved remarkable results in multimodal sentiment analysis.But the fusion of features at different levels of multimodal information is also an important part in determining the effectiveness of sentiment analysis.Thus,a multimodal sentiment analysis model based on adaptive gating information fusion(AGIF) is proposed.Firstly,the different levels of visual and color features extracted by swin transformer and ResNet are organically fused through a gated information fusion network based on their contribution to sentiment analysis.Secondly,the sentiment of an image is often expressed by multiple subtle local regions due to the abstraction and complexity of sentiment,and these sentiment discriminating regions can be located accurately by iterative attention based on past information.The latest ERNIE pre-training model is utilized to solve the problem of Word2Vec and GloVe's inability to handle the word polysemy.Finally,the auto-fusion network is utilized to “dynamically” fuse the features of each modality,solving the pro-blem of information redundancy caused by the deterministic operation(concatenation or TFN) to construct multimodal joint representation.Extensive experiments on three publicly available real datasets demonstrate the effectiveness of the proposed model.
first_indexed 2024-04-09T17:32:47Z
format Article
id doaj.art-aef31f6baebb44d2a2604adecfa74cb2
institution Directory Open Access Journal
issn 1002-137X
language zho
last_indexed 2024-04-09T17:32:47Z
publishDate 2023-03-01
publisher Editorial office of Computer Science
record_format Article
series Jisuanji kexue
spelling doaj.art-aef31f6baebb44d2a2604adecfa74cb22023-04-18T02:33:25ZzhoEditorial office of Computer ScienceJisuanji kexue1002-137X2023-03-0150329830610.11896/jsjkx.220100156Multimodal Sentiment Analysis Based on Adaptive Gated Information FusionCHEN Zhen, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua01 College of Information Science and Engineering,Yunnan University,Kunming 650504,China;2 University Key Laboratory of Internet of Things Technology and Application,Yunnan Province,Kunming 650504,ChinaThe goal of multimodal sentiment analysis is to achieve reliable and robust sentiment analysis by utilizing complementary information provided by multiple modalities.Recently,extracting deep semantic features by neural networks has achieved remarkable results in multimodal sentiment analysis.But the fusion of features at different levels of multimodal information is also an important part in determining the effectiveness of sentiment analysis.Thus,a multimodal sentiment analysis model based on adaptive gating information fusion(AGIF) is proposed.Firstly,the different levels of visual and color features extracted by swin transformer and ResNet are organically fused through a gated information fusion network based on their contribution to sentiment analysis.Secondly,the sentiment of an image is often expressed by multiple subtle local regions due to the abstraction and complexity of sentiment,and these sentiment discriminating regions can be located accurately by iterative attention based on past information.The latest ERNIE pre-training model is utilized to solve the problem of Word2Vec and GloVe's inability to handle the word polysemy.Finally,the auto-fusion network is utilized to “dynamically” fuse the features of each modality,solving the pro-blem of information redundancy caused by the deterministic operation(concatenation or TFN) to construct multimodal joint representation.Extensive experiments on three publicly available real datasets demonstrate the effectiveness of the proposed model.https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2023-50-3-298.pdfmultimodal sentiment analysis|gated information fusion networks|iterative attention|ernie|auto-fusion network
spellingShingle CHEN Zhen, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua
Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion
Jisuanji kexue
multimodal sentiment analysis|gated information fusion networks|iterative attention|ernie|auto-fusion network
title Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion
title_full Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion
title_fullStr Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion
title_full_unstemmed Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion
title_short Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion
title_sort multimodal sentiment analysis based on adaptive gated information fusion
topic multimodal sentiment analysis|gated information fusion networks|iterative attention|ernie|auto-fusion network
url https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2023-50-3-298.pdf
work_keys_str_mv AT chenzhenpuyuanyuanzhaozhengpengxudanqianwenhua multimodalsentimentanalysisbasedonadaptivegatedinformationfusion