Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild
We present an image-based real-time sentiment analysis system that can be used to recognize in-the-wild sentiment expressions on online social networks. The system deploys the newly proposed transformer architecture on online social networks (OSN) big data to extract emotion and sentiment features u...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10122531/ |
_version_ | 1797823426048557056 |
---|---|
author | Fatimah Alzamzami Abdulmotaleb El Saddik |
author_facet | Fatimah Alzamzami Abdulmotaleb El Saddik |
author_sort | Fatimah Alzamzami |
collection | DOAJ |
description | We present an image-based real-time sentiment analysis system that can be used to recognize in-the-wild sentiment expressions on online social networks. The system deploys the newly proposed transformer architecture on online social networks (OSN) big data to extract emotion and sentiment features using three types of images: images containing faces, images containing text, and images containing no faces/text. We build three separate models, one for each type of image, and then fuse all the models to learn the online sentiment behavior. Our proposed methodology combines a supervised two-stage training approach and threshold-moving method, which is crucial for the data imbalance found in OSN data. The training is carried out on existing popular datasets (i.e., for the three models) and our newly proposed dataset, the Domain Free Multimedia Sentiment Dataset (DFMSD). Our results show that inducing the threshold-moving method during the training has enhanced the sentiment learning performance by 5-8% more points compared to when the training was conducted without the threshold-moving approach. Combining the two-stage strategy with the threshold-moving method during the training process, has been proven effective to further improve the learning performance (i.e. by <inline-formula> <tex-math notation="LaTeX">$\approx ~12$ </tex-math></inline-formula>% more enhanced accuracy compared to the threshold-moving strategy alone). Furthermore, the proposed approach has shown a positive learning impact on the fusion of the three models in terms of the accuracy and F-score. |
first_indexed | 2024-03-13T10:23:50Z |
format | Article |
id | doaj.art-69fb595bf4f94413aaeefda1b07f1cc6 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-13T10:23:50Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-69fb595bf4f94413aaeefda1b07f1cc62023-05-19T23:00:38ZengIEEEIEEE Access2169-35362023-01-0111470704707910.1109/ACCESS.2023.327474410122531Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the WildFatimah Alzamzami0https://orcid.org/0000-0002-5009-3861Abdulmotaleb El Saddik1https://orcid.org/0000-0002-7690-8547Multimedia Communication Research Laboratory, School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, CanadaMultimedia Communication Research Laboratory, School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, CanadaWe present an image-based real-time sentiment analysis system that can be used to recognize in-the-wild sentiment expressions on online social networks. The system deploys the newly proposed transformer architecture on online social networks (OSN) big data to extract emotion and sentiment features using three types of images: images containing faces, images containing text, and images containing no faces/text. We build three separate models, one for each type of image, and then fuse all the models to learn the online sentiment behavior. Our proposed methodology combines a supervised two-stage training approach and threshold-moving method, which is crucial for the data imbalance found in OSN data. The training is carried out on existing popular datasets (i.e., for the three models) and our newly proposed dataset, the Domain Free Multimedia Sentiment Dataset (DFMSD). Our results show that inducing the threshold-moving method during the training has enhanced the sentiment learning performance by 5-8% more points compared to when the training was conducted without the threshold-moving approach. Combining the two-stage strategy with the threshold-moving method during the training process, has been proven effective to further improve the learning performance (i.e. by <inline-formula> <tex-math notation="LaTeX">$\approx ~12$ </tex-math></inline-formula>% more enhanced accuracy compared to the threshold-moving strategy alone). Furthermore, the proposed approach has shown a positive learning impact on the fusion of the three models in terms of the accuracy and F-score.https://ieeexplore.ieee.org/document/10122531/TransformersViTsentimentonline social mediatransfer learningthreshold moving |
spellingShingle | Fatimah Alzamzami Abdulmotaleb El Saddik Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild IEEE Access Transformers ViT sentiment online social media transfer learning threshold moving |
title | Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild |
title_full | Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild |
title_fullStr | Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild |
title_full_unstemmed | Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild |
title_short | Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild |
title_sort | transformer based feature fusion approach for multimodal visual sentiment recognition using tweets in the wild |
topic | Transformers ViT sentiment online social media transfer learning threshold moving |
url | https://ieeexplore.ieee.org/document/10122531/ |
work_keys_str_mv | AT fatimahalzamzami transformerbasedfeaturefusionapproachformultimodalvisualsentimentrecognitionusingtweetsinthewild AT abdulmotalebelsaddik transformerbasedfeaturefusionapproachformultimodalvisualsentimentrecognitionusingtweetsinthewild |