Alleviating the inconsistency of multimodal data in cross-modal retrieval

With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modaliti...

Full description

Bibliographic Details
Main Authors:	Li, Tieying, Yang, Xiaochun, Ke, Yiping, Wang, Bin, Liu, Yinan, Xu, Jiaxing
Other Authors:	College of Computing and Data Science
Format:	Conference Paper
Language:	English
Published:	2024
Subjects:	Computer and Information Science Disentangled hash learning Cross-modal retrieval Cross-modal hashing Label refinement Cross-modal common semantic alignment
Online Access:	https://hdl.handle.net/10356/180605

Description
Summary:	With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modalities and labels, which often do not hold in real-world data. We introduce two types of inconsistency: Modality-Modality (M-M) and Modality-Label (M-L) inconsistencies. We further validate the prevalent existence of inconsistent data in multimodal datasets and highlight it will reduce the accuracy of existing Cross-Modal retrieval methods. In this paper, we propose a novel framework called Inconsistency Alleviated Cross-Modal Retrieval (IA-CMR), addressing challenges posed by these inconsistencies. We first utilize two forms of contrastive learning loss and a mutual exclusion constraint to effectively disentangle modal information into modality-common hash codes and modality-unique hash codes. Our dedicated design in modality disentanglement is capable of alleviating the M-M inconsistency. Subsequently, we refine common labels through a label refinement loss and employ a Cross-modal Common Semantic Alignment module for effective alignment. The label refinement process and the CCSA module collectively handle the M-L inconsistency issue. IA-CMR outperforms 9 comparison baselines on two benchmark multimodal datasets, achieving an improvement in retrieval accuracy of up to 25.13%. The results confirm the effectiveness of IA-CMR in alleviating inconsistency and enhancing cross-modal retrieval performance.

Alleviating the inconsistency of multimodal data in cross-modal retrieval

Similar Items