Self-supervised video object segmentation by motion grouping

Animals have evolved highly functional visual systems to understand motion, assisting perception even under complex environments. In this paper, we work towards developing a computer vision system able to segment objects by exploiting motion cues, i.e. motion segmentation. We make the following cont...

Full description

Bibliographic Details
Main Authors: Yang, C, Lamdouar, H, Lu, E, Zisserman, A, Xie, W
Format: Conference item
Language:English
Published: 2021
_version_ 1797099390109745152
author Yang, C
Lamdouar, H
Lu, E
Zisserman, A
Xie, W
author_facet Yang, C
Lamdouar, H
Lu, E
Zisserman, A
Xie, W
author_sort Yang, C
collection OXFORD
description Animals have evolved highly functional visual systems to understand motion, assisting perception even under complex environments. In this paper, we work towards developing a computer vision system able to segment objects by exploiting motion cues, i.e. motion segmentation. We make the following contributions: First, we introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background. Second, we train the architecture in a self-supervised manner, i.e. without using any manual annotations. Third, we analyze several critical components of our method and conduct thorough ablation studies to validate their necessity. Fourth, we evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59). Despite using only optical flow as input, our approach achieves superior or comparable results to previous state-of-the-art self-supervised methods, while being an order of magnitude faster. We additionally evaluate on a challenging camouflage dataset (MoCA), significantly outperforming the other self-supervised approaches, and comparing favourably to the top supervised approach, highlighting the importance of motion cues, and the potential bias towards visual appearance in existing video segmentation models.
first_indexed 2024-03-07T05:23:00Z
format Conference item
id oxford-uuid:df9212c9-c249-4246-8714-c0763111368b
institution University of Oxford
language English
last_indexed 2024-03-07T05:23:00Z
publishDate 2021
record_format dspace
spelling oxford-uuid:df9212c9-c249-4246-8714-c0763111368b2022-03-27T09:40:21ZSelf-supervised video object segmentation by motion groupingConference itemhttp://purl.org/coar/resource_type/c_5794uuid:df9212c9-c249-4246-8714-c0763111368bEnglishSymplectic Elements2021Yang, CLamdouar, HLu, EZisserman, AXie, WAnimals have evolved highly functional visual systems to understand motion, assisting perception even under complex environments. In this paper, we work towards developing a computer vision system able to segment objects by exploiting motion cues, i.e. motion segmentation. We make the following contributions: First, we introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background. Second, we train the architecture in a self-supervised manner, i.e. without using any manual annotations. Third, we analyze several critical components of our method and conduct thorough ablation studies to validate their necessity. Fourth, we evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59). Despite using only optical flow as input, our approach achieves superior or comparable results to previous state-of-the-art self-supervised methods, while being an order of magnitude faster. We additionally evaluate on a challenging camouflage dataset (MoCA), significantly outperforming the other self-supervised approaches, and comparing favourably to the top supervised approach, highlighting the importance of motion cues, and the potential bias towards visual appearance in existing video segmentation models.
spellingShingle Yang, C
Lamdouar, H
Lu, E
Zisserman, A
Xie, W
Self-supervised video object segmentation by motion grouping
title Self-supervised video object segmentation by motion grouping
title_full Self-supervised video object segmentation by motion grouping
title_fullStr Self-supervised video object segmentation by motion grouping
title_full_unstemmed Self-supervised video object segmentation by motion grouping
title_short Self-supervised video object segmentation by motion grouping
title_sort self supervised video object segmentation by motion grouping
work_keys_str_mv AT yangc selfsupervisedvideoobjectsegmentationbymotiongrouping
AT lamdouarh selfsupervisedvideoobjectsegmentationbymotiongrouping
AT lue selfsupervisedvideoobjectsegmentationbymotiongrouping
AT zissermana selfsupervisedvideoobjectsegmentationbymotiongrouping
AT xiew selfsupervisedvideoobjectsegmentationbymotiongrouping