Summary: | With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modalities and labels, which often do not hold in real-world data. We introduce two types of inconsistency: Modality-Modality (M-M) and Modality-Label (M-L) inconsistencies. We further validate the prevalent existence of inconsistent data in multimodal datasets and highlight it will reduce the accuracy of existing Cross-Modal retrieval methods. In this paper, we propose a novel framework called Inconsistency Alleviated Cross-Modal Retrieval (IA-CMR), addressing challenges posed by these inconsistencies. We first utilize two forms of contrastive learning loss and a mutual exclusion constraint to effectively disentangle modal information into modality-common hash codes and modality-unique hash codes. Our dedicated design in modality disentanglement is capable of alleviating the M-M inconsistency. Subsequently, we refine common labels through a label refinement loss and employ a Cross-modal Common Semantic Alignment module for effective alignment. The label refinement process and the CCSA module collectively handle the M-L inconsistency issue. IA-CMR outperforms 9 comparison baselines on two benchmark multimodal datasets, achieving an improvement in retrieval accuracy of up to 25.13%. The results confirm the effectiveness of IA-CMR in alleviating inconsistency and enhancing cross-modal retrieval performance.
|