Dynamic graph message passing networks for visual recognition

Modelling long-range dependencies is critical for scene understanding tasks in computer vision. Although convolution neural networks (CNNs) have excelled in many vision tasks, they are still limited in capturing long-range structured relationships as they typically consist of layers of local kernels...

Full description

Bibliographic Details
Main Authors:	Zhang, L, Chen, M, Arnab, A, Xue, X, Torr, PHS
Format:	Journal article
Language:	English
Published:	Institute of Electrical and Electronics Engineers 2022

_version_	1826308651648811008
author	Zhang, L Chen, M Arnab, A Xue, X Torr, PHS
author_facet	Zhang, L Chen, M Arnab, A Xue, X Torr, PHS
author_sort	Zhang, L
collection	OXFORD
description	Modelling long-range dependencies is critical for scene understanding tasks in computer vision. Although convolution neural networks (CNNs) have excelled in many vision tasks, they are still limited in capturing long-range structured relationships as they typically consist of layers of local kernels. A fully-connected graph, such as the self-attention operation in Transformers, is beneficial for such modelling, however, its computational overhead is prohibitive. In this paper, we propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph. This is achieved by adaptively sampling nodes in the graph, conditioned on the input, for message passing. Based on the sampled nodes, we dynamically predict node-dependent filter weights and the affinity matrix for propagating information between them. This formulation allows us to design a self-attention module, and more importantly a new Transformer-based backbone network, that we use for both image classification pretraining, and for addressing various downstream tasks (e.g. object detection, instance and semantic segmentation). Using this model, we show significant improvements with respect to strong, state-of-the-art baselines on four different tasks. Our approach also outperforms fully-connected graphs while using substantially fewer floating-point operations and parameters. Code and models will be made publicly available at https://github.com/fudan-zvg/DGMN2 .
first_indexed	2024-03-07T07:22:29Z
format	Journal article
id	oxford-uuid:2b52fa6d-b343-4ea3-98d7-e11db4c417fb
institution	University of Oxford
language	English
last_indexed	2024-03-07T07:22:29Z
publishDate	2022
publisher	Institute of Electrical and Electronics Engineers
record_format	dspace
spelling	oxford-uuid:2b52fa6d-b343-4ea3-98d7-e11db4c417fb2022-10-31T14:14:12ZDynamic graph message passing networks for visual recognitionJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:2b52fa6d-b343-4ea3-98d7-e11db4c417fbEnglishSymplectic ElementsInstitute of Electrical and Electronics Engineers 2022Zhang, LChen, MArnab, AXue, XTorr, PHSModelling long-range dependencies is critical for scene understanding tasks in computer vision. Although convolution neural networks (CNNs) have excelled in many vision tasks, they are still limited in capturing long-range structured relationships as they typically consist of layers of local kernels. A fully-connected graph, such as the self-attention operation in Transformers, is beneficial for such modelling, however, its computational overhead is prohibitive. In this paper, we propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph. This is achieved by adaptively sampling nodes in the graph, conditioned on the input, for message passing. Based on the sampled nodes, we dynamically predict node-dependent filter weights and the affinity matrix for propagating information between them. This formulation allows us to design a self-attention module, and more importantly a new Transformer-based backbone network, that we use for both image classification pretraining, and for addressing various downstream tasks (e.g. object detection, instance and semantic segmentation). Using this model, we show significant improvements with respect to strong, state-of-the-art baselines on four different tasks. Our approach also outperforms fully-connected graphs while using substantially fewer floating-point operations and parameters. Code and models will be made publicly available at https://github.com/fudan-zvg/DGMN2 .
spellingShingle	Zhang, L Chen, M Arnab, A Xue, X Torr, PHS Dynamic graph message passing networks for visual recognition
title	Dynamic graph message passing networks for visual recognition
title_full	Dynamic graph message passing networks for visual recognition
title_fullStr	Dynamic graph message passing networks for visual recognition
title_full_unstemmed	Dynamic graph message passing networks for visual recognition
title_short	Dynamic graph message passing networks for visual recognition
title_sort	dynamic graph message passing networks for visual recognition
work_keys_str_mv	AT zhangl dynamicgraphmessagepassingnetworksforvisualrecognition AT chenm dynamicgraphmessagepassingnetworksforvisualrecognition AT arnaba dynamicgraphmessagepassingnetworksforvisualrecognition AT xuex dynamicgraphmessagepassingnetworksforvisualrecognition AT torrphs dynamicgraphmessagepassingnetworksforvisualrecognition

Dynamic graph message passing networks for visual recognition

Similar Items