YouMVOS: an actor-centric multi-shot video object segmentation dataset

Many video understanding tasks require analyzing multishot videos, but existing datasets for video object segmentation (VOS) only consider single-shot videos. To address this challenge, we collected a new dataset-YouMVaS-of 200 popular YouTube videos spanning ten genres, where each video is on avera...

Full description

Bibliographic Details
Main Authors: Wei, D, Kharbanda, S, Arora, S, Roy, R, Jain, N, Palrecha, A, Shah, T, Mathur, S, Mathur, R, Kemkar, A, Chakravarthy, A, Lin, Z, Jang, W-D, Tang, Y, Bai, S, Tompkin, J, Torr, PHS, Pfister, H
Format: Conference item
Language:English
Published: IEEE 2022
_version_ 1797108748327583744
author Wei, D
Kharbanda, S
Arora, S
Roy, R
Jain, N
Palrecha, A
Shah, T
Mathur, S
Mathur, R
Kemkar, A
Chakravarthy, A
Lin, Z
Jang, W-D
Tang, Y
Bai, S
Tompkin, J
Torr, PHS
Pfister, H
author_facet Wei, D
Kharbanda, S
Arora, S
Roy, R
Jain, N
Palrecha, A
Shah, T
Mathur, S
Mathur, R
Kemkar, A
Chakravarthy, A
Lin, Z
Jang, W-D
Tang, Y
Bai, S
Tompkin, J
Torr, PHS
Pfister, H
author_sort Wei, D
collection OXFORD
description Many video understanding tasks require analyzing multishot videos, but existing datasets for video object segmentation (VOS) only consider single-shot videos. To address this challenge, we collected a new dataset-YouMVaS-of 200 popular YouTube videos spanning ten genres, where each video is on average five minutes long and with 75 shots. We selected recurring actors and annotated 431K segmentation masks at a frame rate of six, exceeding previous datasets in average video duration, object variation, and narrative structure complexity. We incorporated good practices of model architecture design, memory management, and multi-shot tracking into an existing video segmentation method to build competitive baseline methods. Through error analysis, we found that these baselines still fail to cope with cross-shot appearance variation on our YouMVOS dataset. Thus, our dataset poses new challenges in multi-shot segmentation towards better video analysis. Data, code, and pre-trained models are available at https://donglaiw.github.io/proj/youMVOS
first_indexed 2024-03-07T07:31:21Z
format Conference item
id oxford-uuid:0f7dbe82-dc2c-4d3b-a148-caf4253aa181
institution University of Oxford
language English
last_indexed 2024-03-07T07:31:21Z
publishDate 2022
publisher IEEE
record_format dspace
spelling oxford-uuid:0f7dbe82-dc2c-4d3b-a148-caf4253aa1812023-02-03T10:52:28ZYouMVOS: an actor-centric multi-shot video object segmentation datasetConference itemhttp://purl.org/coar/resource_type/c_5794uuid:0f7dbe82-dc2c-4d3b-a148-caf4253aa181EnglishSymplectic Elements IEEE2022Wei, DKharbanda, SArora, SRoy, RJain, NPalrecha, AShah, TMathur, SMathur, RKemkar, AChakravarthy, ALin, ZJang, W-DTang, YBai, STompkin, JTorr, PHSPfister, HMany video understanding tasks require analyzing multishot videos, but existing datasets for video object segmentation (VOS) only consider single-shot videos. To address this challenge, we collected a new dataset-YouMVaS-of 200 popular YouTube videos spanning ten genres, where each video is on average five minutes long and with 75 shots. We selected recurring actors and annotated 431K segmentation masks at a frame rate of six, exceeding previous datasets in average video duration, object variation, and narrative structure complexity. We incorporated good practices of model architecture design, memory management, and multi-shot tracking into an existing video segmentation method to build competitive baseline methods. Through error analysis, we found that these baselines still fail to cope with cross-shot appearance variation on our YouMVOS dataset. Thus, our dataset poses new challenges in multi-shot segmentation towards better video analysis. Data, code, and pre-trained models are available at https://donglaiw.github.io/proj/youMVOS
spellingShingle Wei, D
Kharbanda, S
Arora, S
Roy, R
Jain, N
Palrecha, A
Shah, T
Mathur, S
Mathur, R
Kemkar, A
Chakravarthy, A
Lin, Z
Jang, W-D
Tang, Y
Bai, S
Tompkin, J
Torr, PHS
Pfister, H
YouMVOS: an actor-centric multi-shot video object segmentation dataset
title YouMVOS: an actor-centric multi-shot video object segmentation dataset
title_full YouMVOS: an actor-centric multi-shot video object segmentation dataset
title_fullStr YouMVOS: an actor-centric multi-shot video object segmentation dataset
title_full_unstemmed YouMVOS: an actor-centric multi-shot video object segmentation dataset
title_short YouMVOS: an actor-centric multi-shot video object segmentation dataset
title_sort youmvos an actor centric multi shot video object segmentation dataset
work_keys_str_mv AT weid youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT kharbandas youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT aroras youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT royr youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT jainn youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT palrechaa youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT shaht youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT mathurs youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT mathurr youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT kemkara youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT chakravarthya youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT linz youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT jangwd youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT tangy youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT bais youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT tompkinj youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT torrphs youmvosanactorcentricmultishotvideoobjectsegmentationdataset
AT pfisterh youmvosanactorcentricmultishotvideoobjectsegmentationdataset