Learning Multimodal Representations by Symmetrically Transferring Local Structures

Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary...

Full description

Bibliographic Details
Main Authors:	Bin Dong, Songlei Jian, Kai Lu
Format:	Article
Language:	English
Published:	MDPI AG 2020-09-01
Series:	Symmetry
Subjects:	multimodal representations soft metric learning local structure neural networks
Online Access:	https://www.mdpi.com/2073-8994/12/9/1504

_version_	1797553863980482560
author	Bin Dong Songlei Jian Kai Lu
author_facet	Bin Dong Songlei Jian Kai Lu
author_sort	Bin Dong
collection	DOAJ
description	Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering.
first_indexed	2024-03-10T16:22:40Z
format	Article
id	doaj.art-947c9fb294b84ac9ad71a02d72d756ba
institution	Directory Open Access Journal
issn	2073-8994
language	English
last_indexed	2024-03-10T16:22:40Z
publishDate	2020-09-01
publisher	MDPI AG
record_format	Article
series	Symmetry
spelling	doaj.art-947c9fb294b84ac9ad71a02d72d756ba2023-11-20T13:34:06ZengMDPI AGSymmetry2073-89942020-09-01129150410.3390/sym12091504Learning Multimodal Representations by Symmetrically Transferring Local StructuresBin Dong0Songlei Jian1Kai Lu2College of Computer, National University of Defense Technology, Changsha 410000, ChinaCollege of Computer, National University of Defense Technology, Changsha 410000, ChinaCollege of Computer, National University of Defense Technology, Changsha 410000, ChinaMultimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering.https://www.mdpi.com/2073-8994/12/9/1504multimodal representationssoft metric learninglocal structureneural networks
spellingShingle	Bin Dong Songlei Jian Kai Lu Learning Multimodal Representations by Symmetrically Transferring Local Structures Symmetry multimodal representations soft metric learning local structure neural networks
title	Learning Multimodal Representations by Symmetrically Transferring Local Structures
title_full	Learning Multimodal Representations by Symmetrically Transferring Local Structures
title_fullStr	Learning Multimodal Representations by Symmetrically Transferring Local Structures
title_full_unstemmed	Learning Multimodal Representations by Symmetrically Transferring Local Structures
title_short	Learning Multimodal Representations by Symmetrically Transferring Local Structures
title_sort	learning multimodal representations by symmetrically transferring local structures
topic	multimodal representations soft metric learning local structure neural networks
url	https://www.mdpi.com/2073-8994/12/9/1504
work_keys_str_mv	AT bindong learningmultimodalrepresentationsbysymmetricallytransferringlocalstructures AT songleijian learningmultimodalrepresentationsbysymmetricallytransferringlocalstructures AT kailu learningmultimodalrepresentationsbysymmetricallytransferringlocalstructures

Learning Multimodal Representations by Symmetrically Transferring Local Structures

Similar Items