A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective
Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentra...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-04-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/7/4571 |
_version_ | 1797608361004367872 |
---|---|
author | Suping Wang Ligu Zhu Lei Shi Hao Mo Songfu Tan |
author_facet | Suping Wang Ligu Zhu Lei Shi Hao Mo Songfu Tan |
author_sort | Suping Wang |
collection | DOAJ |
description | Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentrated on harmonizing cross-modal representation learning and the full-cycle modeling of high-level semantic associations between vision and language, diverging from traditional statistical methods. We systematically categorized and summarized the challenges and open issues in implementing current technologies and investigated the pipeline of cross-modal retrieval, including pre-processing, feature engineering, pre-training tasks, encoding, cross-modal interaction, decoding, model optimization, and a unified architecture. Furthermore, we propose benchmark datasets and evaluation metrics to assist researchers in keeping pace with cross-modal retrieval advancements. By incorporating recent innovative works, we offer a perspective on potential advancements in cross-modal retrieval. |
first_indexed | 2024-03-11T05:42:23Z |
format | Article |
id | doaj.art-f0423bb92d9e4d678c429ffdc18bfedc |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T05:42:23Z |
publishDate | 2023-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-f0423bb92d9e4d678c429ffdc18bfedc2023-11-17T16:22:15ZengMDPI AGApplied Sciences2076-34172023-04-01137457110.3390/app13074571A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning PerspectiveSuping Wang0Ligu Zhu1Lei Shi2Hao Mo3Songfu Tan4State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaCross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentrated on harmonizing cross-modal representation learning and the full-cycle modeling of high-level semantic associations between vision and language, diverging from traditional statistical methods. We systematically categorized and summarized the challenges and open issues in implementing current technologies and investigated the pipeline of cross-modal retrieval, including pre-processing, feature engineering, pre-training tasks, encoding, cross-modal interaction, decoding, model optimization, and a unified architecture. Furthermore, we propose benchmark datasets and evaluation metrics to assist researchers in keeping pace with cross-modal retrieval advancements. By incorporating recent innovative works, we offer a perspective on potential advancements in cross-modal retrieval.https://www.mdpi.com/2076-3417/13/7/4571cross-modal retrievalrepresentation learningfull-cycle modelingfeature engineeringpre-training tasks |
spellingShingle | Suping Wang Ligu Zhu Lei Shi Hao Mo Songfu Tan A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective Applied Sciences cross-modal retrieval representation learning full-cycle modeling feature engineering pre-training tasks |
title | A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective |
title_full | A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective |
title_fullStr | A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective |
title_full_unstemmed | A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective |
title_short | A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective |
title_sort | survey of full cycle cross modal retrieval from a representation learning perspective |
topic | cross-modal retrieval representation learning full-cycle modeling feature engineering pre-training tasks |
url | https://www.mdpi.com/2076-3417/13/7/4571 |
work_keys_str_mv | AT supingwang asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT liguzhu asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT leishi asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT haomo asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT songfutan asurveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT supingwang surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT liguzhu surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT leishi surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT haomo surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective AT songfutan surveyoffullcyclecrossmodalretrievalfromarepresentationlearningperspective |