A Semantic Enhancement Framework for Multimodal Sarcasm Detection
Sarcasm represents a language form where a discrepancy lies between the literal meanings and implied intention. Sarcasm detection is challenging with unimodal text without clearly understanding the context, based on which multimodal information is introduced to benefit detection. However, current ap...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-01-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/12/2/317 |
_version_ | 1797342982945374208 |
---|---|
author | Weiyu Zhong Zhengxuan Zhang Qiaofeng Wu Yun Xue Qianhua Cai |
author_facet | Weiyu Zhong Zhengxuan Zhang Qiaofeng Wu Yun Xue Qianhua Cai |
author_sort | Weiyu Zhong |
collection | DOAJ |
description | Sarcasm represents a language form where a discrepancy lies between the literal meanings and implied intention. Sarcasm detection is challenging with unimodal text without clearly understanding the context, based on which multimodal information is introduced to benefit detection. However, current approaches only focus on modeling text–image incongruity at the token level and use the incongruity as the key to detection, ignoring the significance of the overall multimodal features and textual semantics during processing. Moreover, semantic information from other samples with a similar manner of expression also facilitates sarcasm detection. In this work, a semantic enhancement framework is proposed to address image–text congruity by modeling textual and visual information at the multi-scale and multi-span token level. The efficacy of textual semantics in multimodal sarcasm detection is pronounced. Aiming to bridge the cross-modal semantic gap, semantic enhancement is performed by using a multiple contrastive learning strategy. Experiments were conducted on a benchmark dataset. Our model outperforms the latest baseline by 1.87% in terms of the F1-score and 1% in terms of accuracy. |
first_indexed | 2024-03-08T10:41:05Z |
format | Article |
id | doaj.art-321b92dff8f74ca5a6c8354fcc5ca8df |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-08T10:41:05Z |
publishDate | 2024-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-321b92dff8f74ca5a6c8354fcc5ca8df2024-01-26T17:33:28ZengMDPI AGMathematics2227-73902024-01-0112231710.3390/math12020317A Semantic Enhancement Framework for Multimodal Sarcasm DetectionWeiyu Zhong0Zhengxuan Zhang1Qiaofeng Wu2Yun Xue3Qianhua Cai4School of Electronics and Information Engineering, South China Normal University, Foshan 528225, ChinaSchool of Electronics and Information Engineering, South China Normal University, Foshan 528225, ChinaSchool of Electronics and Information Engineering, South China Normal University, Foshan 528225, ChinaSchool of Electronics and Information Engineering, South China Normal University, Foshan 528225, ChinaSchool of Electronics and Information Engineering, South China Normal University, Foshan 528225, ChinaSarcasm represents a language form where a discrepancy lies between the literal meanings and implied intention. Sarcasm detection is challenging with unimodal text without clearly understanding the context, based on which multimodal information is introduced to benefit detection. However, current approaches only focus on modeling text–image incongruity at the token level and use the incongruity as the key to detection, ignoring the significance of the overall multimodal features and textual semantics during processing. Moreover, semantic information from other samples with a similar manner of expression also facilitates sarcasm detection. In this work, a semantic enhancement framework is proposed to address image–text congruity by modeling textual and visual information at the multi-scale and multi-span token level. The efficacy of textual semantics in multimodal sarcasm detection is pronounced. Aiming to bridge the cross-modal semantic gap, semantic enhancement is performed by using a multiple contrastive learning strategy. Experiments were conducted on a benchmark dataset. Our model outperforms the latest baseline by 1.87% in terms of the F1-score and 1% in terms of accuracy.https://www.mdpi.com/2227-7390/12/2/317multimodal sarcasm detectioncontrastive learninggraph neural networkssocial media |
spellingShingle | Weiyu Zhong Zhengxuan Zhang Qiaofeng Wu Yun Xue Qianhua Cai A Semantic Enhancement Framework for Multimodal Sarcasm Detection Mathematics multimodal sarcasm detection contrastive learning graph neural networks social media |
title | A Semantic Enhancement Framework for Multimodal Sarcasm Detection |
title_full | A Semantic Enhancement Framework for Multimodal Sarcasm Detection |
title_fullStr | A Semantic Enhancement Framework for Multimodal Sarcasm Detection |
title_full_unstemmed | A Semantic Enhancement Framework for Multimodal Sarcasm Detection |
title_short | A Semantic Enhancement Framework for Multimodal Sarcasm Detection |
title_sort | semantic enhancement framework for multimodal sarcasm detection |
topic | multimodal sarcasm detection contrastive learning graph neural networks social media |
url | https://www.mdpi.com/2227-7390/12/2/317 |
work_keys_str_mv | AT weiyuzhong asemanticenhancementframeworkformultimodalsarcasmdetection AT zhengxuanzhang asemanticenhancementframeworkformultimodalsarcasmdetection AT qiaofengwu asemanticenhancementframeworkformultimodalsarcasmdetection AT yunxue asemanticenhancementframeworkformultimodalsarcasmdetection AT qianhuacai asemanticenhancementframeworkformultimodalsarcasmdetection AT weiyuzhong semanticenhancementframeworkformultimodalsarcasmdetection AT zhengxuanzhang semanticenhancementframeworkformultimodalsarcasmdetection AT qiaofengwu semanticenhancementframeworkformultimodalsarcasmdetection AT yunxue semanticenhancementframeworkformultimodalsarcasmdetection AT qianhuacai semanticenhancementframeworkformultimodalsarcasmdetection |