Learning Multimodal Representations by Symmetrically Transferring Local Structures
Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-09-01
|
Series: | Symmetry |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-8994/12/9/1504 |
_version_ | 1797553863980482560 |
---|---|
author | Bin Dong Songlei Jian Kai Lu |
author_facet | Bin Dong Songlei Jian Kai Lu |
author_sort | Bin Dong |
collection | DOAJ |
description | Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering. |
first_indexed | 2024-03-10T16:22:40Z |
format | Article |
id | doaj.art-947c9fb294b84ac9ad71a02d72d756ba |
institution | Directory Open Access Journal |
issn | 2073-8994 |
language | English |
last_indexed | 2024-03-10T16:22:40Z |
publishDate | 2020-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Symmetry |
spelling | doaj.art-947c9fb294b84ac9ad71a02d72d756ba2023-11-20T13:34:06ZengMDPI AGSymmetry2073-89942020-09-01129150410.3390/sym12091504Learning Multimodal Representations by Symmetrically Transferring Local StructuresBin Dong0Songlei Jian1Kai Lu2College of Computer, National University of Defense Technology, Changsha 410000, ChinaCollege of Computer, National University of Defense Technology, Changsha 410000, ChinaCollege of Computer, National University of Defense Technology, Changsha 410000, ChinaMultimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering.https://www.mdpi.com/2073-8994/12/9/1504multimodal representationssoft metric learninglocal structureneural networks |
spellingShingle | Bin Dong Songlei Jian Kai Lu Learning Multimodal Representations by Symmetrically Transferring Local Structures Symmetry multimodal representations soft metric learning local structure neural networks |
title | Learning Multimodal Representations by Symmetrically Transferring Local Structures |
title_full | Learning Multimodal Representations by Symmetrically Transferring Local Structures |
title_fullStr | Learning Multimodal Representations by Symmetrically Transferring Local Structures |
title_full_unstemmed | Learning Multimodal Representations by Symmetrically Transferring Local Structures |
title_short | Learning Multimodal Representations by Symmetrically Transferring Local Structures |
title_sort | learning multimodal representations by symmetrically transferring local structures |
topic | multimodal representations soft metric learning local structure neural networks |
url | https://www.mdpi.com/2073-8994/12/9/1504 |
work_keys_str_mv | AT bindong learningmultimodalrepresentationsbysymmetricallytransferringlocalstructures AT songleijian learningmultimodalrepresentationsbysymmetricallytransferringlocalstructures AT kailu learningmultimodalrepresentationsbysymmetricallytransferringlocalstructures |