Soft Contrastive Cross-Modal Retrieval
Cross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, lead...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/14/5/1944 |
_version_ | 1827319790012727296 |
---|---|
author | Jiayu Song Yuxuan Hu Lei Zhu Chengyuan Zhang Jian Zhang Shichao Zhang |
author_facet | Jiayu Song Yuxuan Hu Lei Zhu Chengyuan Zhang Jian Zhang Shichao Zhang |
author_sort | Jiayu Song |
collection | DOAJ |
description | Cross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, leading to less interpretable and potentially overfitting representations. Most existing methods realize outstanding results based on datasets without any error or noise, but that is extremely ideal and leads to trained models lacking robustness. To solve these problems, in this paper, we propose a novel approach, Soft Contrastive Cross-Modal Retrieval (SCCMR), which integrates the deep cross-modal model with soft contrastive learning and smooth label cross-entropy learning to boost common subspace embedding and improve the generalizability and robustness of the model. To confirm the performance and effectiveness of SCCMR, we conduct extensive experiments comparing 12 state-of-the-art methods on three multi-modal datasets by using image–text retrieval as a showcase. The experimental results show that our proposed method outperforms the baselines. |
first_indexed | 2024-04-25T00:35:40Z |
format | Article |
id | doaj.art-a1384ba3bbfc405e9c56d54c95fb56df |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-04-25T00:35:40Z |
publishDate | 2024-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-a1384ba3bbfc405e9c56d54c95fb56df2024-03-12T16:39:33ZengMDPI AGApplied Sciences2076-34172024-02-01145194410.3390/app14051944Soft Contrastive Cross-Modal RetrievalJiayu Song0Yuxuan Hu1Lei Zhu2Chengyuan Zhang3Jian Zhang4Shichao Zhang5School of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaCollege of Information and Intelligence, Hunan Agricultural University, Changsha 410128, ChinaCollege of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaCross-modal retrieval plays a key role in the Natural Language Processing area, which aims to retrieve one modality to another efficiently. Despite the notable achievements of existing cross-modal retrieval methodologies, the complexity of the embedding space increases with more complex models, leading to less interpretable and potentially overfitting representations. Most existing methods realize outstanding results based on datasets without any error or noise, but that is extremely ideal and leads to trained models lacking robustness. To solve these problems, in this paper, we propose a novel approach, Soft Contrastive Cross-Modal Retrieval (SCCMR), which integrates the deep cross-modal model with soft contrastive learning and smooth label cross-entropy learning to boost common subspace embedding and improve the generalizability and robustness of the model. To confirm the performance and effectiveness of SCCMR, we conduct extensive experiments comparing 12 state-of-the-art methods on three multi-modal datasets by using image–text retrieval as a showcase. The experimental results show that our proposed method outperforms the baselines.https://www.mdpi.com/2076-3417/14/5/1944cross-modal retrievalsoft contrastive learningsmooth label learningcommon subspacedeep learning |
spellingShingle | Jiayu Song Yuxuan Hu Lei Zhu Chengyuan Zhang Jian Zhang Shichao Zhang Soft Contrastive Cross-Modal Retrieval Applied Sciences cross-modal retrieval soft contrastive learning smooth label learning common subspace deep learning |
title | Soft Contrastive Cross-Modal Retrieval |
title_full | Soft Contrastive Cross-Modal Retrieval |
title_fullStr | Soft Contrastive Cross-Modal Retrieval |
title_full_unstemmed | Soft Contrastive Cross-Modal Retrieval |
title_short | Soft Contrastive Cross-Modal Retrieval |
title_sort | soft contrastive cross modal retrieval |
topic | cross-modal retrieval soft contrastive learning smooth label learning common subspace deep learning |
url | https://www.mdpi.com/2076-3417/14/5/1944 |
work_keys_str_mv | AT jiayusong softcontrastivecrossmodalretrieval AT yuxuanhu softcontrastivecrossmodalretrieval AT leizhu softcontrastivecrossmodalretrieval AT chengyuanzhang softcontrastivecrossmodalretrieval AT jianzhang softcontrastivecrossmodalretrieval AT shichaozhang softcontrastivecrossmodalretrieval |