Cross‐modal semantic correlation learning by Bi‐CNN network

Abstract Cross modal retrieval can retrieve images through a text query and vice versa. In recent years, cross modal retrieval has attracted extensive attention. The purpose of most now available cross modal retrieval methods is to find a common subspace and maximize the different modal correlation....

Full description

Bibliographic Details
Main Authors:	Chaoyi Wang, Liang Li, Chenggang Yan, Zhan Wang, Yaoqi Sun, Jiyong Zhang
Format:	Article
Language:	English
Published:	Wiley 2021-12-01
Series:	IET Image Processing
Subjects:	Optical, image and video signal processing Computer vision and image processing techniques Information retrieval techniques Neural nets
Online Access:	https://doi.org/10.1049/ipr2.12176

_version_	1798027600539418624
author	Chaoyi Wang Liang Li Chenggang Yan Zhan Wang Yaoqi Sun Jiyong Zhang
author_facet	Chaoyi Wang Liang Li Chenggang Yan Zhan Wang Yaoqi Sun Jiyong Zhang
author_sort	Chaoyi Wang
collection	DOAJ
description	Abstract Cross modal retrieval can retrieve images through a text query and vice versa. In recent years, cross modal retrieval has attracted extensive attention. The purpose of most now available cross modal retrieval methods is to find a common subspace and maximize the different modal correlation. To generate specific representations consistent with cross modal tasks, this paper proposes a novel cross modal retrieval framework, which integrates feature learning and latent space embedding. In detail, we proposed a deep CNN and a shallow CNN to extract the feature of the samples. The deep CNN is used to extract the representation of images, and the shallow CNN uses a multi‐dimensional kernel to extract multi‐level semantic representation of text. Meanwhile, we enhance the semantic manifold by constructing cross modal ranking and within‐modal discriminant loss to improve the division of semantic representation. Moreover, the most representative samples are selected by using online sampling strategy, so that the approach can be implemented on a large‐scale data. This approach not only increases the discriminative ability among different categories, but also maximizes the relativity between different modalities. Experiments on three real word datasets show that the proposed method is superior to the popular methods.
first_indexed	2024-04-11T18:54:04Z
format	Article
id	doaj.art-63c3d119128048deb09c7f15c31adc7e
institution	Directory Open Access Journal
issn	1751-9659 1751-9667
language	English
last_indexed	2024-04-11T18:54:04Z
publishDate	2021-12-01
publisher	Wiley
record_format	Article
series	IET Image Processing
spelling	doaj.art-63c3d119128048deb09c7f15c31adc7e2022-12-22T04:08:14ZengWileyIET Image Processing1751-96591751-96672021-12-0115143674368410.1049/ipr2.12176Cross‐modal semantic correlation learning by Bi‐CNN networkChaoyi Wang0Liang Li1Chenggang Yan2Zhan Wang3Yaoqi Sun4Jiyong Zhang5Hangzhou Dianzi University Hangzhou ChinaInstitute of computing technology, CAS Beijing ChinaHangzhou Dianzi University Hangzhou ChinaRTInvent Technology Co., Ltd Beijing ChinaHangzhou Dianzi University Hangzhou ChinaHangzhou Dianzi University Hangzhou ChinaAbstract Cross modal retrieval can retrieve images through a text query and vice versa. In recent years, cross modal retrieval has attracted extensive attention. The purpose of most now available cross modal retrieval methods is to find a common subspace and maximize the different modal correlation. To generate specific representations consistent with cross modal tasks, this paper proposes a novel cross modal retrieval framework, which integrates feature learning and latent space embedding. In detail, we proposed a deep CNN and a shallow CNN to extract the feature of the samples. The deep CNN is used to extract the representation of images, and the shallow CNN uses a multi‐dimensional kernel to extract multi‐level semantic representation of text. Meanwhile, we enhance the semantic manifold by constructing cross modal ranking and within‐modal discriminant loss to improve the division of semantic representation. Moreover, the most representative samples are selected by using online sampling strategy, so that the approach can be implemented on a large‐scale data. This approach not only increases the discriminative ability among different categories, but also maximizes the relativity between different modalities. Experiments on three real word datasets show that the proposed method is superior to the popular methods.https://doi.org/10.1049/ipr2.12176Optical, image and video signal processingComputer vision and image processing techniquesInformation retrieval techniquesNeural nets
spellingShingle	Chaoyi Wang Liang Li Chenggang Yan Zhan Wang Yaoqi Sun Jiyong Zhang Cross‐modal semantic correlation learning by Bi‐CNN network IET Image Processing Optical, image and video signal processing Computer vision and image processing techniques Information retrieval techniques Neural nets
title	Cross‐modal semantic correlation learning by Bi‐CNN network
title_full	Cross‐modal semantic correlation learning by Bi‐CNN network
title_fullStr	Cross‐modal semantic correlation learning by Bi‐CNN network
title_full_unstemmed	Cross‐modal semantic correlation learning by Bi‐CNN network
title_short	Cross‐modal semantic correlation learning by Bi‐CNN network
title_sort	cross modal semantic correlation learning by bi cnn network
topic	Optical, image and video signal processing Computer vision and image processing techniques Information retrieval techniques Neural nets
url	https://doi.org/10.1049/ipr2.12176
work_keys_str_mv	AT chaoyiwang crossmodalsemanticcorrelationlearningbybicnnnetwork AT liangli crossmodalsemanticcorrelationlearningbybicnnnetwork AT chenggangyan crossmodalsemanticcorrelationlearningbybicnnnetwork AT zhanwang crossmodalsemanticcorrelationlearningbybicnnnetwork AT yaoqisun crossmodalsemanticcorrelationlearningbybicnnnetwork AT jiyongzhang crossmodalsemanticcorrelationlearningbybicnnnetwork

Cross‐modal semantic correlation learning by Bi‐CNN network

Similar Items