Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval
Multi-modal retrieval is a challenge due to heterogeneous gap and a complex semantic relationship between different modal data. Typical research map different modalities into a common subspace with a one-to-one correspondence or similarity/dissimilarity relationship of inter-modal data, in which the...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-03-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/9/3/466 |
_version_ | 1798026931174637568 |
---|---|
author | Yan Hua Yingyun Yang Jianhe Du |
author_facet | Yan Hua Yingyun Yang Jianhe Du |
author_sort | Yan Hua |
collection | DOAJ |
description | Multi-modal retrieval is a challenge due to heterogeneous gap and a complex semantic relationship between different modal data. Typical research map different modalities into a common subspace with a one-to-one correspondence or similarity/dissimilarity relationship of inter-modal data, in which the distances of heterogeneous data can be compared directly; thus, inter-modal retrieval can be achieved by the nearest neighboring search. However, most of them ignore intra-modal relations and complicated semantics between multi-modal data. In this paper, we propose a deep multi-modal metric learning method with multi-scale semantic correlation to deal with the retrieval tasks between image and text modalities. A deep model with two branches is designed to nonlinearly map raw heterogeneous data into comparable representations. In contrast to binary similarity, we formulate semantic relationship with multi-scale similarity to learn fine-grained multi-modal distances. Inter-modal and intra-modal correlations constructed on multi-scale semantic similarity are incorporated to train the deep model in an end-to-end way. Experiments validate the effectiveness of our proposed method on multi-modal retrieval tasks, and our method outperforms state-of-the-art methods on NUS-WIDE, MIR Flickr, and Wikipedia datasets. |
first_indexed | 2024-04-11T18:43:05Z |
format | Article |
id | doaj.art-8e469b9a65b64fa4a6631f12f7b8c227 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-04-11T18:43:05Z |
publishDate | 2020-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-8e469b9a65b64fa4a6631f12f7b8c2272022-12-22T04:08:55ZengMDPI AGElectronics2079-92922020-03-019346610.3390/electronics9030466electronics9030466Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text RetrievalYan Hua0Yingyun Yang1Jianhe Du2School of Information and Communication Engineering, Communication University of China, Beijing 100024, ChinaSchool of Information and Communication Engineering, Communication University of China, Beijing 100024, ChinaSchool of Information and Communication Engineering, Communication University of China, Beijing 100024, ChinaMulti-modal retrieval is a challenge due to heterogeneous gap and a complex semantic relationship between different modal data. Typical research map different modalities into a common subspace with a one-to-one correspondence or similarity/dissimilarity relationship of inter-modal data, in which the distances of heterogeneous data can be compared directly; thus, inter-modal retrieval can be achieved by the nearest neighboring search. However, most of them ignore intra-modal relations and complicated semantics between multi-modal data. In this paper, we propose a deep multi-modal metric learning method with multi-scale semantic correlation to deal with the retrieval tasks between image and text modalities. A deep model with two branches is designed to nonlinearly map raw heterogeneous data into comparable representations. In contrast to binary similarity, we formulate semantic relationship with multi-scale similarity to learn fine-grained multi-modal distances. Inter-modal and intra-modal correlations constructed on multi-scale semantic similarity are incorporated to train the deep model in an end-to-end way. Experiments validate the effectiveness of our proposed method on multi-modal retrieval tasks, and our method outperforms state-of-the-art methods on NUS-WIDE, MIR Flickr, and Wikipedia datasets.https://www.mdpi.com/2079-9292/9/3/466deep learningmetric learningmulti-modal correlationcross-modal retrievalimage–text retrieval |
spellingShingle | Yan Hua Yingyun Yang Jianhe Du Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval Electronics deep learning metric learning multi-modal correlation cross-modal retrieval image–text retrieval |
title | Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval |
title_full | Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval |
title_fullStr | Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval |
title_full_unstemmed | Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval |
title_short | Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval |
title_sort | deep multi modal metric learning with multi scale correlation for image text retrieval |
topic | deep learning metric learning multi-modal correlation cross-modal retrieval image–text retrieval |
url | https://www.mdpi.com/2079-9292/9/3/466 |
work_keys_str_mv | AT yanhua deepmultimodalmetriclearningwithmultiscalecorrelationforimagetextretrieval AT yingyunyang deepmultimodalmetriclearningwithmultiscalecorrelationforimagetextretrieval AT jianhedu deepmultimodalmetriclearningwithmultiscalecorrelationforimagetextretrieval |