A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective

Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentra...

Full description

Bibliographic Details
Main Authors:	Suping Wang, Ligu Zhu, Lei Shi, Hao Mo, Songfu Tan
Format:	Article
Language:	English
Published:	MDPI AG 2023-04-01
Series:	Applied Sciences
Subjects:	cross-modal retrieval representation learning full-cycle modeling feature engineering pre-training tasks
Online Access:	https://www.mdpi.com/2076-3417/13/7/4571

_version_	1797608361004367872
author	Suping Wang Ligu Zhu Lei Shi Hao Mo Songfu Tan
author_facet	Suping Wang Ligu Zhu Lei Shi Hao Mo Songfu Tan
author_sort	Suping Wang
collection	DOAJ
description	Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentrated on harmonizing cross-modal representation learning and the full-cycle modeling of high-level semantic associations between vision and language, diverging from traditional statistical methods. We systematically categorized and summarized the challenges and open issues in implementing current technologies and investigated the pipeline of cross-modal retrieval, including pre-processing, feature engineering, pre-training tasks, encoding, cross-modal interaction, decoding, model optimization, and a unified architecture. Furthermore, we propose benchmark datasets and evaluation metrics to assist researchers in keeping pace with cross-modal retrieval advancements. By incorporating recent innovative works, we offer a perspective on potential advancements in cross-modal retrieval.
first_indexed	2024-03-11T05:42:23Z
format	Article
id	doaj.art-f0423bb92d9e4d678c429ffdc18bfedc
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T05:42:23Z
publishDate	2023-04-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-f0423bb92d9e4d678c429ffdc18bfedc2023-11-17T16:22:15ZengMDPI AGApplied Sciences2076-34172023-04-01137457110.3390/app13074571A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning PerspectiveSuping Wang0Ligu Zhu1Lei Shi2Hao Mo3Songfu Tan4State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaCross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentrated on harmonizing cross-modal representation learning and the full-cycle modeling of high-level semantic associations between vision and language, diverging from traditional statistical methods. We systematically categorized and summarized the challenges and open issues in implementing current technologies and investigated the pipeline of cross-modal retrieval, including pre-processing, feature engineering, pre-training tasks, encoding, cross-modal interaction, decoding, model optimization, and a unified architecture. Furthermore, we propose benchmark datasets and evaluation metrics to assist researchers in keeping pace with cross-modal retrieval advancements. By incorporating recent innovative works, we offer a perspective on potential advancements in cross-modal retrieval.https://www.mdpi.com/2076-3417/13/7/4571cross-modal retrievalrepresentation learningfull-cycle modelingfeature engineeringpre-training tasks
spellingShingle	Suping Wang Ligu Zhu Lei Shi Hao Mo Songfu Tan A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective Applied Sciences cross-modal retrieval representation learning full-cycle modeling feature engineering pre-training tasks
title	A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
title_full	A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
title_fullStr	A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
title_full_unstemmed	A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
title_short	A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
title_sort	survey of full cycle cross modal retrieval from a representation learning perspective
topic	cross-modal retrieval representation learning full-cycle modeling feature engineering pre-training tasks
url	https://www.mdpi.com/2076-3417/13/7/4571
work_keys_str_mv	AT supingwang asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT liguzhu asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT leishi asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT haomo asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT songfutan asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT supingwang surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT liguzhu surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT leishi surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT haomo surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT songfutan surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective

A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective

Similar Items