Learning Multimodal Representations by Symmetrically Transferring Local Structures

Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary...

Full description

Bibliographic Details
Main Authors: Bin Dong, Songlei Jian, Kai Lu
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/12/9/1504
_version_ 1797553863980482560
author Bin Dong
Songlei Jian
Kai Lu
author_facet Bin Dong
Songlei Jian
Kai Lu
author_sort Bin Dong
collection DOAJ
description Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering.
first_indexed 2024-03-10T16:22:40Z
format Article
id doaj.art-947c9fb294b84ac9ad71a02d72d756ba
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-03-10T16:22:40Z
publishDate 2020-09-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-947c9fb294b84ac9ad71a02d72d756ba2023-11-20T13:34:06ZengMDPI AGSymmetry2073-89942020-09-01129150410.3390/sym12091504Learning Multimodal Representations by Symmetrically Transferring Local StructuresBin Dong0Songlei Jian1Kai Lu2College of Computer, National University of Defense Technology, Changsha 410000, ChinaCollege of Computer, National University of Defense Technology, Changsha 410000, ChinaCollege of Computer, National University of Defense Technology, Changsha 410000, ChinaMultimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering.https://www.mdpi.com/2073-8994/12/9/1504multimodal representationssoft metric learninglocal structureneural networks
spellingShingle Bin Dong
Songlei Jian
Kai Lu
Learning Multimodal Representations by Symmetrically Transferring Local Structures
Symmetry
multimodal representations
soft metric learning
local structure
neural networks
title Learning Multimodal Representations by Symmetrically Transferring Local Structures
title_full Learning Multimodal Representations by Symmetrically Transferring Local Structures
title_fullStr Learning Multimodal Representations by Symmetrically Transferring Local Structures
title_full_unstemmed Learning Multimodal Representations by Symmetrically Transferring Local Structures
title_short Learning Multimodal Representations by Symmetrically Transferring Local Structures
title_sort learning multimodal representations by symmetrically transferring local structures
topic multimodal representations
soft metric learning
local structure
neural networks
url https://www.mdpi.com/2073-8994/12/9/1504
work_keys_str_mv AT bindong learningmultimodalrepresentationsbysymmetricallytransferringlocalstructures
AT songleijian learningmultimodalrepresentationsbysymmetricallytransferringlocalstructures
AT kailu learningmultimodalrepresentationsbysymmetricallytransferringlocalstructures