A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective

Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentra...

Full description

Bibliographic Details
Main Authors: Suping Wang, Ligu Zhu, Lei Shi, Hao Mo, Songfu Tan
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/7/4571
_version_ 1797608361004367872
author Suping Wang
Ligu Zhu
Lei Shi
Hao Mo
Songfu Tan
author_facet Suping Wang
Ligu Zhu
Lei Shi
Hao Mo
Songfu Tan
author_sort Suping Wang
collection DOAJ
description Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentrated on harmonizing cross-modal representation learning and the full-cycle modeling of high-level semantic associations between vision and language, diverging from traditional statistical methods. We systematically categorized and summarized the challenges and open issues in implementing current technologies and investigated the pipeline of cross-modal retrieval, including pre-processing, feature engineering, pre-training tasks, encoding, cross-modal interaction, decoding, model optimization, and a unified architecture. Furthermore, we propose benchmark datasets and evaluation metrics to assist researchers in keeping pace with cross-modal retrieval advancements. By incorporating recent innovative works, we offer a perspective on potential advancements in cross-modal retrieval.
first_indexed 2024-03-11T05:42:23Z
format Article
id doaj.art-f0423bb92d9e4d678c429ffdc18bfedc
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T05:42:23Z
publishDate 2023-04-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-f0423bb92d9e4d678c429ffdc18bfedc2023-11-17T16:22:15ZengMDPI AGApplied Sciences2076-34172023-04-01137457110.3390/app13074571A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning PerspectiveSuping Wang0Ligu Zhu1Lei Shi2Hao Mo3Songfu Tan4State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaCross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentrated on harmonizing cross-modal representation learning and the full-cycle modeling of high-level semantic associations between vision and language, diverging from traditional statistical methods. We systematically categorized and summarized the challenges and open issues in implementing current technologies and investigated the pipeline of cross-modal retrieval, including pre-processing, feature engineering, pre-training tasks, encoding, cross-modal interaction, decoding, model optimization, and a unified architecture. Furthermore, we propose benchmark datasets and evaluation metrics to assist researchers in keeping pace with cross-modal retrieval advancements. By incorporating recent innovative works, we offer a perspective on potential advancements in cross-modal retrieval.https://www.mdpi.com/2076-3417/13/7/4571cross-modal retrievalrepresentation learningfull-cycle modelingfeature engineeringpre-training tasks
spellingShingle Suping Wang
Ligu Zhu
Lei Shi
Hao Mo
Songfu Tan
A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
Applied Sciences
cross-modal retrieval
representation learning
full-cycle modeling
feature engineering
pre-training tasks
title A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
title_full A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
title_fullStr A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
title_full_unstemmed A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
title_short A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
title_sort survey of full cycle cross modal retrieval from a representation learning perspective
topic cross-modal retrieval
representation learning
full-cycle modeling
feature engineering
pre-training tasks
url https://www.mdpi.com/2076-3417/13/7/4571
work_keys_str_mv AT supingwang asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective
AT liguzhu asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective
AT leishi asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective
AT haomo asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective
AT songfutan asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective
AT supingwang surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective
AT liguzhu surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective
AT leishi surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective
AT haomo surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective
AT songfutan surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective