Soft Contrastive Cross-Modal Retrieval

Cross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, lead...

Full description

Bibliographic Details
Main Authors:	Jiayu Song, Yuxuan Hu, Lei Zhu, Chengyuan Zhang, Jian Zhang, Shichao Zhang
Format:	Article
Language:	English
Published:	MDPI AG 2024-02-01
Series:	Applied Sciences
Subjects:	cross-modal retrieval soft contrastive learning smooth label learning common subspace deep learning
Online Access:	https://www.mdpi.com/2076-3417/14/5/1944

_version_	1827319790012727296
author	Jiayu Song Yuxuan Hu Lei Zhu Chengyuan Zhang Jian Zhang Shichao Zhang
author_facet	Jiayu Song Yuxuan Hu Lei Zhu Chengyuan Zhang Jian Zhang Shichao Zhang
author_sort	Jiayu Song
collection	DOAJ
description	Cross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, leading to less interpretable and potentially overfitting representations. Most existing methods realize outstanding results based on datasets without any error or noise, but that is extremely ideal and leads to trained models lacking robustness. To solve these problems, in this paper, we propose a novel approach, Soft Contrastive Cross-Modal Retrieval (SCCMR), which integrates the deep cross-modal model with soft contrastive learning and smooth label cross-entropy learning to boost common subspace embedding and improve the generalizability and robustness of the model. To confirm the performance and effectiveness of SCCMR, we conduct extensive experiments comparing 12 state-of-the-art methods on three multi-modal datasets by using image–text retrieval as a showcase. The experimental results show that our proposed method outperforms the baselines.
first_indexed	2024-04-25T00:35:40Z
format	Article
id	doaj.art-a1384ba3bbfc405e9c56d54c95fb56df
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-04-25T00:35:40Z
publishDate	2024-02-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-a1384ba3bbfc405e9c56d54c95fb56df2024-03-12T16:39:33ZengMDPI AGApplied Sciences2076-34172024-02-01145194410.3390/app14051944Soft Contrastive Cross-Modal RetrievalJiayu Song0Yuxuan Hu1Lei Zhu2Chengyuan Zhang3Jian Zhang4Shichao Zhang5School of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaCollege of Information and Intelligence, Hunan Agricultural University, Changsha 410128, ChinaCollege of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaCross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, leading to less interpretable and potentially overfitting representations. Most existing methods realize outstanding results based on datasets without any error or noise, but that is extremely ideal and leads to trained models lacking robustness. To solve these problems, in this paper, we propose a novel approach, Soft Contrastive Cross-Modal Retrieval (SCCMR), which integrates the deep cross-modal model with soft contrastive learning and smooth label cross-entropy learning to boost common subspace embedding and improve the generalizability and robustness of the model. To confirm the performance and effectiveness of SCCMR, we conduct extensive experiments comparing 12 state-of-the-art methods on three multi-modal datasets by using image–text retrieval as a showcase. The experimental results show that our proposed method outperforms the baselines.https://www.mdpi.com/2076-3417/14/5/1944cross-modal retrievalsoft contrastive learningsmooth label learningcommon subspacedeep learning
spellingShingle	Jiayu Song Yuxuan Hu Lei Zhu Chengyuan Zhang Jian Zhang Shichao Zhang Soft Contrastive Cross-Modal Retrieval Applied Sciences cross-modal retrieval soft contrastive learning smooth label learning common subspace deep learning
title	Soft Contrastive Cross-Modal Retrieval
title_full	Soft Contrastive Cross-Modal Retrieval
title_fullStr	Soft Contrastive Cross-Modal Retrieval
title_full_unstemmed	Soft Contrastive Cross-Modal Retrieval
title_short	Soft Contrastive Cross-Modal Retrieval
title_sort	soft contrastive cross modal retrieval
topic	cross-modal retrieval soft contrastive learning smooth label learning common subspace deep learning
url	https://www.mdpi.com/2076-3417/14/5/1944
work_keys_str_mv	AT jiayusong softcontrastivecrossmodalretrieval AT yuxuanhu softcontrastivecrossmodalretrieval AT leizhu softcontrastivecrossmodalretrieval AT chengyuanzhang softcontrastivecrossmodalretrieval AT jianzhang softcontrastivecrossmodalretrieval AT shichaozhang softcontrastivecrossmodalretrieval

Soft Contrastive Cross-Modal Retrieval

Similar Items