Adaptive Template and Transition Map for Real-Time Video Object Segmentation

Semi-supervised video object segmentation (semi-VOS) is required for many visual applications. This task is tracking class-agnostic objects from a given segmentation mask. Various approaches have been developed and achieved high accuracy in this field, but these previous models are hard to be utiliz...

Full description

Bibliographic Details
Main Authors:	Hyojin Park, Jayeon Yoo, Ganesh Venkatesh, Nojun Kwak
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Semi-supervised video object segmentation video object segmentation video object tracking deep learning
Online Access:	https://ieeexplore.ieee.org/document/9519655/

Description
Summary:	Semi-supervised video object segmentation (semi-VOS) is required for many visual applications. This task is tracking class-agnostic objects from a given segmentation mask. Various approaches have been developed and achieved high accuracy in this field, but these previous models are hard to be utilized in real-world applications due to slow inference time and tremendous complexity. To significantly speed up inference while reducing performance gaps from those previous models, we introduce a fast segmentation model based on a template matching method and auxiliary loss with a transition map. Our template matching method consists of short-term and long-term matching. The short-term matching enhances target object localization by focusing on neighboring frames, while long-term matching improves fine details and handles object shape-changing by considering long-range frames. However, since both matching processes generate each template based on the previously estimated masks, this incurs error propagation for tracking objects in the next frames. To mitigate this problem, we add auxiliary loss with a newly proposed transition map for encouraging correction power to create accurate masks of the target object. Our model obtains <inline-formula> <tex-math notation="LaTeX">$81.1\%~J\&F$ </tex-math></inline-formula> score at the speed of 78.3 FPS on the DAVIS16 benchmark and achieves <inline-formula> <tex-math notation="LaTeX">$1.4\times $ </tex-math></inline-formula> faster speed and 11.3% higher accuracy than SiamMask, one of the fast semi-VOS models.
ISSN:	2169-3536

Adaptive Template and Transition Map for Real-Time Video Object Segmentation

Similar Items