A Semantic Enhancement Framework for Multimodal Sarcasm Detection

Sarcasm represents a language form where a discrepancy lies between the literal meanings and implied intention. Sarcasm detection is challenging with unimodal text without clearly understanding the context, based on which multimodal information is introduced to benefit detection. However, current ap...

Full description

Bibliographic Details
Main Authors: Weiyu Zhong, Zhengxuan Zhang, Qiaofeng Wu, Yun Xue, Qianhua Cai
Format: Article
Language:English
Published: MDPI AG 2024-01-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/12/2/317
_version_ 1797342982945374208
author Weiyu Zhong
Zhengxuan Zhang
Qiaofeng Wu
Yun Xue
Qianhua Cai
author_facet Weiyu Zhong
Zhengxuan Zhang
Qiaofeng Wu
Yun Xue
Qianhua Cai
author_sort Weiyu Zhong
collection DOAJ
description Sarcasm represents a language form where a discrepancy lies between the literal meanings and implied intention. Sarcasm detection is challenging with unimodal text without clearly understanding the context, based on which multimodal information is introduced to benefit detection. However, current approaches only focus on modeling text–image incongruity at the token level and use the incongruity as the key to detection, ignoring the significance of the overall multimodal features and textual semantics during processing. Moreover, semantic information from other samples with a similar manner of expression also facilitates sarcasm detection. In this work, a semantic enhancement framework is proposed to address image–text congruity by modeling textual and visual information at the multi-scale and multi-span token level. The efficacy of textual semantics in multimodal sarcasm detection is pronounced. Aiming to bridge the cross-modal semantic gap, semantic enhancement is performed by using a multiple contrastive learning strategy. Experiments were conducted on a benchmark dataset. Our model outperforms the latest baseline by 1.87% in terms of the F1-score and 1% in terms of accuracy.
first_indexed 2024-03-08T10:41:05Z
format Article
id doaj.art-321b92dff8f74ca5a6c8354fcc5ca8df
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-08T10:41:05Z
publishDate 2024-01-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-321b92dff8f74ca5a6c8354fcc5ca8df2024-01-26T17:33:28ZengMDPI AGMathematics2227-73902024-01-0112231710.3390/math12020317A Semantic Enhancement Framework for Multimodal Sarcasm DetectionWeiyu Zhong0Zhengxuan Zhang1Qiaofeng Wu2Yun Xue3Qianhua Cai4School of Electronics and Information Engineering, South China Normal University, Foshan 528225, ChinaSchool of Electronics and Information Engineering, South China Normal University, Foshan 528225, ChinaSchool of Electronics and Information Engineering, South China Normal University, Foshan 528225, ChinaSchool of Electronics and Information Engineering, South China Normal University, Foshan 528225, ChinaSchool of Electronics and Information Engineering, South China Normal University, Foshan 528225, ChinaSarcasm represents a language form where a discrepancy lies between the literal meanings and implied intention. Sarcasm detection is challenging with unimodal text without clearly understanding the context, based on which multimodal information is introduced to benefit detection. However, current approaches only focus on modeling text–image incongruity at the token level and use the incongruity as the key to detection, ignoring the significance of the overall multimodal features and textual semantics during processing. Moreover, semantic information from other samples with a similar manner of expression also facilitates sarcasm detection. In this work, a semantic enhancement framework is proposed to address image–text congruity by modeling textual and visual information at the multi-scale and multi-span token level. The efficacy of textual semantics in multimodal sarcasm detection is pronounced. Aiming to bridge the cross-modal semantic gap, semantic enhancement is performed by using a multiple contrastive learning strategy. Experiments were conducted on a benchmark dataset. Our model outperforms the latest baseline by 1.87% in terms of the F1-score and 1% in terms of accuracy.https://www.mdpi.com/2227-7390/12/2/317multimodal sarcasm detectioncontrastive learninggraph neural networkssocial media
spellingShingle Weiyu Zhong
Zhengxuan Zhang
Qiaofeng Wu
Yun Xue
Qianhua Cai
A Semantic Enhancement Framework for Multimodal Sarcasm Detection
Mathematics
multimodal sarcasm detection
contrastive learning
graph neural networks
social media
title A Semantic Enhancement Framework for Multimodal Sarcasm Detection
title_full A Semantic Enhancement Framework for Multimodal Sarcasm Detection
title_fullStr A Semantic Enhancement Framework for Multimodal Sarcasm Detection
title_full_unstemmed A Semantic Enhancement Framework for Multimodal Sarcasm Detection
title_short A Semantic Enhancement Framework for Multimodal Sarcasm Detection
title_sort semantic enhancement framework for multimodal sarcasm detection
topic multimodal sarcasm detection
contrastive learning
graph neural networks
social media
url https://www.mdpi.com/2227-7390/12/2/317
work_keys_str_mv AT weiyuzhong asemanticenhancementframeworkformultimodalsarcasmdetection
AT zhengxuanzhang asemanticenhancementframeworkformultimodalsarcasmdetection
AT qiaofengwu asemanticenhancementframeworkformultimodalsarcasmdetection
AT yunxue asemanticenhancementframeworkformultimodalsarcasmdetection
AT qianhuacai asemanticenhancementframeworkformultimodalsarcasmdetection
AT weiyuzhong semanticenhancementframeworkformultimodalsarcasmdetection
AT zhengxuanzhang semanticenhancementframeworkformultimodalsarcasmdetection
AT qiaofengwu semanticenhancementframeworkformultimodalsarcasmdetection
AT yunxue semanticenhancementframeworkformultimodalsarcasmdetection
AT qianhuacai semanticenhancementframeworkformultimodalsarcasmdetection