Soft Contrastive Cross-Modal Retrieval

Cross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, lead...

Full description

Bibliographic Details
Main Authors: Jiayu Song, Yuxuan Hu, Lei Zhu, Chengyuan Zhang, Jian Zhang, Shichao Zhang
Format: Article
Language:English
Published: MDPI AG 2024-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/5/1944
_version_ 1827319790012727296
author Jiayu Song
Yuxuan Hu
Lei Zhu
Chengyuan Zhang
Jian Zhang
Shichao Zhang
author_facet Jiayu Song
Yuxuan Hu
Lei Zhu
Chengyuan Zhang
Jian Zhang
Shichao Zhang
author_sort Jiayu Song
collection DOAJ
description Cross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, leading to less interpretable and potentially overfitting representations. Most existing methods realize outstanding results based on datasets without any error or noise, but that is extremely ideal and leads to trained models lacking robustness. To solve these problems, in this paper, we propose a novel approach, Soft Contrastive Cross-Modal Retrieval (SCCMR), which integrates the deep cross-modal model with soft contrastive learning and smooth label cross-entropy learning to boost common subspace embedding and improve the generalizability and robustness of the model. To confirm the performance and effectiveness of SCCMR, we conduct extensive experiments comparing 12 state-of-the-art methods on three multi-modal datasets by using image–text retrieval as a showcase. The experimental results show that our proposed method outperforms the baselines.
first_indexed 2024-04-25T00:35:40Z
format Article
id doaj.art-a1384ba3bbfc405e9c56d54c95fb56df
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-04-25T00:35:40Z
publishDate 2024-02-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-a1384ba3bbfc405e9c56d54c95fb56df2024-03-12T16:39:33ZengMDPI AGApplied Sciences2076-34172024-02-01145194410.3390/app14051944Soft Contrastive Cross-Modal RetrievalJiayu Song0Yuxuan Hu1Lei Zhu2Chengyuan Zhang3Jian Zhang4Shichao Zhang5School of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaCollege of Information and Intelligence, Hunan Agricultural University, Changsha 410128, ChinaCollege of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaCross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, leading to less interpretable and potentially overfitting representations. Most existing methods realize outstanding results based on datasets without any error or noise, but that is extremely ideal and leads to trained models lacking robustness. To solve these problems, in this paper, we propose a novel approach, Soft Contrastive Cross-Modal Retrieval (SCCMR), which integrates the deep cross-modal model with soft contrastive learning and smooth label cross-entropy learning to boost common subspace embedding and improve the generalizability and robustness of the model. To confirm the performance and effectiveness of SCCMR, we conduct extensive experiments comparing 12 state-of-the-art methods on three multi-modal datasets by using image–text retrieval as a showcase. The experimental results show that our proposed method outperforms the baselines.https://www.mdpi.com/2076-3417/14/5/1944cross-modal retrievalsoft contrastive learningsmooth label learningcommon subspacedeep learning
spellingShingle Jiayu Song
Yuxuan Hu
Lei Zhu
Chengyuan Zhang
Jian Zhang
Shichao Zhang
Soft Contrastive Cross-Modal Retrieval
Applied Sciences
cross-modal retrieval
soft contrastive learning
smooth label learning
common subspace
deep learning
title Soft Contrastive Cross-Modal Retrieval
title_full Soft Contrastive Cross-Modal Retrieval
title_fullStr Soft Contrastive Cross-Modal Retrieval
title_full_unstemmed Soft Contrastive Cross-Modal Retrieval
title_short Soft Contrastive Cross-Modal Retrieval
title_sort soft contrastive cross modal retrieval
topic cross-modal retrieval
soft contrastive learning
smooth label learning
common subspace
deep learning
url https://www.mdpi.com/2076-3417/14/5/1944
work_keys_str_mv AT jiayusong softcontrastivecrossmodalretrieval
AT yuxuanhu softcontrastivecrossmodalretrieval
AT leizhu softcontrastivecrossmodalretrieval
AT chengyuanzhang softcontrastivecrossmodalretrieval
AT jianzhang softcontrastivecrossmodalretrieval
AT shichaozhang softcontrastivecrossmodalretrieval