Controllable attention for structured layered video decomposition

The objective of this paper is to be able to separate a video into its natural layers, and to control which of the separated layers to attend to. For example, to be able to separate reflections, transparency or object motion. We make the following three contributions: (i) we introduce a new structur...

Full description

Bibliographic Details
Main Authors:	Alayrac, J-B, Carreira, J, Arandjelovic, R, Zisserman, A
Format:	Conference item
Language:	English
Published:	IEEE 2020

_version_	1826285731194077184
author	Alayrac, J-B Carreira, J Arandjelovic, R Zisserman, A
author_facet	Alayrac, J-B Carreira, J Arandjelovic, R Zisserman, A
author_sort	Alayrac, J-B
collection	OXFORD
description	The objective of this paper is to be able to separate a video into its natural layers, and to control which of the separated layers to attend to. For example, to be able to separate reflections, transparency or object motion. We make the following three contributions: (i) we introduce a new structured neural network architecture that explicitly incorporates layers (as spatial masks) into its design. This improves separation performance over previous general purpose networks for this task; (ii) we demonstrate that we can augment the architecture to leverage external cues such as audio for controllability and to help disambiguation; and (iii) we experimentally demonstrate the effectiveness of our approach and training procedure with controlled experiments while also showing that the proposed model can be successfully applied to real-word applications such as reflection removal and action recognition in cluttered scenes.
first_indexed	2024-03-07T01:33:13Z
format	Conference item
id	oxford-uuid:9447f7ad-376b-448b-ae1b-8884ca0618cc
institution	University of Oxford
language	English
last_indexed	2024-03-07T01:33:13Z
publishDate	2020
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:9447f7ad-376b-448b-ae1b-8884ca0618cc2022-03-26T23:38:21ZControllable attention for structured layered video decompositionConference itemhttp://purl.org/coar/resource_type/c_5794uuid:9447f7ad-376b-448b-ae1b-8884ca0618ccEnglishSymplectic ElementsIEEE2020Alayrac, J-BCarreira, JArandjelovic, RZisserman, AThe objective of this paper is to be able to separate a video into its natural layers, and to control which of the separated layers to attend to. For example, to be able to separate reflections, transparency or object motion. We make the following three contributions: (i) we introduce a new structured neural network architecture that explicitly incorporates layers (as spatial masks) into its design. This improves separation performance over previous general purpose networks for this task; (ii) we demonstrate that we can augment the architecture to leverage external cues such as audio for controllability and to help disambiguation; and (iii) we experimentally demonstrate the effectiveness of our approach and training procedure with controlled experiments while also showing that the proposed model can be successfully applied to real-word applications such as reflection removal and action recognition in cluttered scenes.
spellingShingle	Alayrac, J-B Carreira, J Arandjelovic, R Zisserman, A Controllable attention for structured layered video decomposition
title	Controllable attention for structured layered video decomposition
title_full	Controllable attention for structured layered video decomposition
title_fullStr	Controllable attention for structured layered video decomposition
title_full_unstemmed	Controllable attention for structured layered video decomposition
title_short	Controllable attention for structured layered video decomposition
title_sort	controllable attention for structured layered video decomposition
work_keys_str_mv	AT alayracjb controllableattentionforstructuredlayeredvideodecomposition AT carreiraj controllableattentionforstructuredlayeredvideodecomposition AT arandjelovicr controllableattentionforstructuredlayeredvideodecomposition AT zissermana controllableattentionforstructuredlayeredvideodecomposition

Controllable attention for structured layered video decomposition

Similar Items