Meta-learning deep visual words for fast video object segmentation

Accurate video object segmentation methods finetune a model using the first annotated frame, and/or use additional inputs such as optical flow and complex post-processing. In contrast, we develop a fast algorithm that requires no finetuning, auxiliary inputs or post-processing, and segments a variab...

Full description

Bibliographic Details
Main Authors:	Behl, HS, Najaf, M, Arnab, A, Torr, PHS
Format:	Conference item
Language:	English
Published:	2019

_version_	1826295845226545152
author	Behl, HS Najaf, M Arnab, A Torr, PHS
author_facet	Behl, HS Najaf, M Arnab, A Torr, PHS
author_sort	Behl, HS
collection	OXFORD
description	Accurate video object segmentation methods finetune a model using the first annotated frame, and/or use additional inputs such as optical flow and complex post-processing. In contrast, we develop a fast algorithm that requires no finetuning, auxiliary inputs or post-processing, and segments a variable number of objects in a single forward-pass. We represent an object with clusters, or “visual words”, in the embedding space, which correspond to object parts in the image space. This allows us to robustly match to the reference objects throughout the video, because although the global appearance of an object changes as it undergoes occlusions and deformations, the appearance of more local parts may stay consistent. We learn these visual words in an unsupervised manner, using meta-learning to ensure that our training objective matches our inference procedure. We achieve comparable accuracy to finetuning based methods, and state-of-the-art in terms of speed/accuracy trade-offs on four video segmentation datasets.
first_indexed	2024-03-07T04:07:14Z
format	Conference item
id	oxford-uuid:c697af74-5ec8-4d5b-addc-bd2a0ae7e189
institution	University of Oxford
language	English
last_indexed	2024-03-07T04:07:14Z
publishDate	2019
record_format	dspace
spelling	oxford-uuid:c697af74-5ec8-4d5b-addc-bd2a0ae7e1892022-03-27T06:39:11ZMeta-learning deep visual words for fast video object segmentationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:c697af74-5ec8-4d5b-addc-bd2a0ae7e189EnglishSymplectic Elements2019Behl, HSNajaf, MArnab, ATorr, PHSAccurate video object segmentation methods finetune a model using the first annotated frame, and/or use additional inputs such as optical flow and complex post-processing. In contrast, we develop a fast algorithm that requires no finetuning, auxiliary inputs or post-processing, and segments a variable number of objects in a single forward-pass. We represent an object with clusters, or “visual words”, in the embedding space, which correspond to object parts in the image space. This allows us to robustly match to the reference objects throughout the video, because although the global appearance of an object changes as it undergoes occlusions and deformations, the appearance of more local parts may stay consistent. We learn these visual words in an unsupervised manner, using meta-learning to ensure that our training objective matches our inference procedure. We achieve comparable accuracy to finetuning based methods, and state-of-the-art in terms of speed/accuracy trade-offs on four video segmentation datasets.
spellingShingle	Behl, HS Najaf, M Arnab, A Torr, PHS Meta-learning deep visual words for fast video object segmentation
title	Meta-learning deep visual words for fast video object segmentation
title_full	Meta-learning deep visual words for fast video object segmentation
title_fullStr	Meta-learning deep visual words for fast video object segmentation
title_full_unstemmed	Meta-learning deep visual words for fast video object segmentation
title_short	Meta-learning deep visual words for fast video object segmentation
title_sort	meta learning deep visual words for fast video object segmentation
work_keys_str_mv	AT behlhs metalearningdeepvisualwordsforfastvideoobjectsegmentation AT najafm metalearningdeepvisualwordsforfastvideoobjectsegmentation AT arnaba metalearningdeepvisualwordsforfastvideoobjectsegmentation AT torrphs metalearningdeepvisualwordsforfastvideoobjectsegmentation

Meta-learning deep visual words for fast video object segmentation

Similar Items