YouMVOS: an actor-centric multi-shot video object segmentation dataset
Many video understanding tasks require analyzing multishot videos, but existing datasets for video object segmentation (VOS) only consider single-shot videos. To address this challenge, we collected a new dataset-YouMVaS-of 200 popular YouTube videos spanning ten genres, where each video is on avera...
Main Authors: | , , , , , , , , , , , , , , , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2022
|
_version_ | 1797108748327583744 |
---|---|
author | Wei, D Kharbanda, S Arora, S Roy, R Jain, N Palrecha, A Shah, T Mathur, S Mathur, R Kemkar, A Chakravarthy, A Lin, Z Jang, W-D Tang, Y Bai, S Tompkin, J Torr, PHS Pfister, H |
author_facet | Wei, D Kharbanda, S Arora, S Roy, R Jain, N Palrecha, A Shah, T Mathur, S Mathur, R Kemkar, A Chakravarthy, A Lin, Z Jang, W-D Tang, Y Bai, S Tompkin, J Torr, PHS Pfister, H |
author_sort | Wei, D |
collection | OXFORD |
description | Many video understanding tasks require analyzing multishot videos, but existing datasets for video object segmentation (VOS) only consider single-shot videos. To address this challenge, we collected a new dataset-YouMVaS-of 200 popular YouTube videos spanning ten genres, where each video is on average five minutes long and with 75 shots. We selected recurring actors and annotated 431K segmentation masks at a frame rate of six, exceeding previous datasets in average video duration, object variation, and narrative structure complexity. We incorporated good practices of model architecture design, memory management, and multi-shot tracking into an existing video segmentation method to build competitive baseline methods. Through error analysis, we found that these baselines still fail to cope with cross-shot appearance variation on our YouMVOS dataset. Thus, our dataset poses new challenges in multi-shot segmentation towards better video analysis. Data, code, and pre-trained models are available at https://donglaiw.github.io/proj/youMVOS |
first_indexed | 2024-03-07T07:31:21Z |
format | Conference item |
id | oxford-uuid:0f7dbe82-dc2c-4d3b-a148-caf4253aa181 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:31:21Z |
publishDate | 2022 |
publisher | IEEE |
record_format | dspace |
spelling | oxford-uuid:0f7dbe82-dc2c-4d3b-a148-caf4253aa1812023-02-03T10:52:28ZYouMVOS: an actor-centric multi-shot video object segmentation datasetConference itemhttp://purl.org/coar/resource_type/c_5794uuid:0f7dbe82-dc2c-4d3b-a148-caf4253aa181EnglishSymplectic Elements IEEE2022Wei, DKharbanda, SArora, SRoy, RJain, NPalrecha, AShah, TMathur, SMathur, RKemkar, AChakravarthy, ALin, ZJang, W-DTang, YBai, STompkin, JTorr, PHSPfister, HMany video understanding tasks require analyzing multishot videos, but existing datasets for video object segmentation (VOS) only consider single-shot videos. To address this challenge, we collected a new dataset-YouMVaS-of 200 popular YouTube videos spanning ten genres, where each video is on average five minutes long and with 75 shots. We selected recurring actors and annotated 431K segmentation masks at a frame rate of six, exceeding previous datasets in average video duration, object variation, and narrative structure complexity. We incorporated good practices of model architecture design, memory management, and multi-shot tracking into an existing video segmentation method to build competitive baseline methods. Through error analysis, we found that these baselines still fail to cope with cross-shot appearance variation on our YouMVOS dataset. Thus, our dataset poses new challenges in multi-shot segmentation towards better video analysis. Data, code, and pre-trained models are available at https://donglaiw.github.io/proj/youMVOS |
spellingShingle | Wei, D Kharbanda, S Arora, S Roy, R Jain, N Palrecha, A Shah, T Mathur, S Mathur, R Kemkar, A Chakravarthy, A Lin, Z Jang, W-D Tang, Y Bai, S Tompkin, J Torr, PHS Pfister, H YouMVOS: an actor-centric multi-shot video object segmentation dataset |
title | YouMVOS: an actor-centric multi-shot video object segmentation dataset |
title_full | YouMVOS: an actor-centric multi-shot video object segmentation dataset |
title_fullStr | YouMVOS: an actor-centric multi-shot video object segmentation dataset |
title_full_unstemmed | YouMVOS: an actor-centric multi-shot video object segmentation dataset |
title_short | YouMVOS: an actor-centric multi-shot video object segmentation dataset |
title_sort | youmvos an actor centric multi shot video object segmentation dataset |
work_keys_str_mv | AT weid youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT kharbandas youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT aroras youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT royr youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT jainn youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT palrechaa youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT shaht youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT mathurs youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT mathurr youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT kemkara youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT chakravarthya youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT linz youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT jangwd youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT tangy youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT bais youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT tompkinj youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT torrphs youmvosanactorcentricmultishotvideoobjectsegmentationdataset AT pfisterh youmvosanactorcentricmultishotvideoobjectsegmentationdataset |