Occluded video instance segmentation: A benchmark

Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS cons...

Full description

Bibliographic Details
Main Authors:	Qi, J, Gao, Y, Hu, Y, Wang, X, Liu, X, Bai, X, Belongie, S, Yuille, A, Torr, PHS, Bai, S
Format:	Journal article
Language:	English
Published:	Springer 2022

_version_	1797107392973897728
author	Qi, J Gao, Y Hu, Y Wang, X Liu, X Bai, X Belongie, S Yuille, A Torr, PHS Bai, S
author_facet	Qi, J Gao, Y Hu, Y Wang, X Liu, X Bai, X Belongie, S Yuille, A Torr, PHS Bai, S
author_sort	Qi, J
collection	OXFORD
description	Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems cannot. On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16.3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario. We also present a simple plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion. Built upon MaskTrack R-CNN and SipMask, we obtain a remarkable AP improvement on the OVIS dataset. The OVIS dataset and project code are available at http://songbai.site/ovis.
first_indexed	2024-03-07T07:15:23Z
format	Journal article
id	oxford-uuid:775c0679-5072-4f19-bd82-2c2e95a55a30
institution	University of Oxford
language	English
last_indexed	2024-03-07T07:15:23Z
publishDate	2022
publisher	Springer
record_format	dspace
spelling	oxford-uuid:775c0679-5072-4f19-bd82-2c2e95a55a302022-08-04T06:34:55ZOccluded video instance segmentation: A benchmarkJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:775c0679-5072-4f19-bd82-2c2e95a55a30EnglishSymplectic ElementsSpringer2022Qi, JGao, YHu, YWang, XLiu, XBai, XBelongie, SYuille, ATorr, PHSBai, SCan our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems cannot. On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16.3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario. We also present a simple plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion. Built upon MaskTrack R-CNN and SipMask, we obtain a remarkable AP improvement on the OVIS dataset. The OVIS dataset and project code are available at http://songbai.site/ovis.
spellingShingle	Qi, J Gao, Y Hu, Y Wang, X Liu, X Bai, X Belongie, S Yuille, A Torr, PHS Bai, S Occluded video instance segmentation: A benchmark
title	Occluded video instance segmentation: A benchmark
title_full	Occluded video instance segmentation: A benchmark
title_fullStr	Occluded video instance segmentation: A benchmark
title_full_unstemmed	Occluded video instance segmentation: A benchmark
title_short	Occluded video instance segmentation: A benchmark
title_sort	occluded video instance segmentation a benchmark
work_keys_str_mv	AT qij occludedvideoinstancesegmentationabenchmark AT gaoy occludedvideoinstancesegmentationabenchmark AT huy occludedvideoinstancesegmentationabenchmark AT wangx occludedvideoinstancesegmentationabenchmark AT liux occludedvideoinstancesegmentationabenchmark AT baix occludedvideoinstancesegmentationabenchmark AT belongies occludedvideoinstancesegmentationabenchmark AT yuillea occludedvideoinstancesegmentationabenchmark AT torrphs occludedvideoinstancesegmentationabenchmark AT bais occludedvideoinstancesegmentationabenchmark

Occluded video instance segmentation: A benchmark

Similar Items