MoNet : deep motion exploitation for video object segmentation

In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the repre...

Full description

Bibliographic Details
Main Authors: Xiao, Huaxin, Feng, Jiashi, Lin, Guosheng, Liu, Yu, Zhang, Maojun
Other Authors: School of Computer Science and Engineering
Format: Conference Paper
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/143257
_version_ 1811690753833828352
author Xiao, Huaxin
Feng, Jiashi
Lin, Guosheng
Liu, Yu
Zhang, Maojun
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Xiao, Huaxin
Feng, Jiashi
Lin, Guosheng
Liu, Yu
Zhang, Maojun
author_sort Xiao, Huaxin
collection NTU
description In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the representation of the target frame by aligning and integrating representations from its neighbors. The new representation provides valuable temporal contexts for segmentation and improves robustness to various common contaminating factors, e.g., motion blur, appearance variation and deformation of video objects. Moreover, MoNet exploits motion inconsistency and transforms such motion cue into foreground/background prior to eliminate distraction from confusing instances and noisy regions. By introducing a distance transform layer, MoNet can effectively separate motion-inconstant instances/regions and thoroughly refine segmentation results. Integrating the proposed two motion exploitation components with a standard segmentation network, MoNet provides new state-of-the-art performance on three competitive benchmark datasets.
first_indexed 2024-10-01T06:09:01Z
format Conference Paper
id ntu-10356/143257
institution Nanyang Technological University
language English
last_indexed 2024-10-01T06:09:01Z
publishDate 2020
record_format dspace
spelling ntu-10356/1432572020-08-17T05:05:17Z MoNet : deep motion exploitation for video object segmentation Xiao, Huaxin Feng, Jiashi Lin, Guosheng Liu, Yu Zhang, Maojun School of Computer Science and Engineering 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018 CVPR) Engineering::Computer science and engineering Motion Segmentation Feature Extraction In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the representation of the target frame by aligning and integrating representations from its neighbors. The new representation provides valuable temporal contexts for segmentation and improves robustness to various common contaminating factors, e.g., motion blur, appearance variation and deformation of video objects. Moreover, MoNet exploits motion inconsistency and transforms such motion cue into foreground/background prior to eliminate distraction from confusing instances and noisy regions. By introducing a distance transform layer, MoNet can effectively separate motion-inconstant instances/regions and thoroughly refine segmentation results. Integrating the proposed two motion exploitation components with a standard segmentation network, MoNet provides new state-of-the-art performance on three competitive benchmark datasets. Ministry of Education (MOE) Accepted version Huaxin Xiao was supported by the China Scholarship Council under Grant 201603170287. Jiashi Feng was partially supported by NUS startup R-263-000-C08-133, MOE Tier-I R-263-000-C21-112, NUS IDS R-263-000-C67-646 and ECRA R-263-000-C87-133. 2020-08-17T05:05:16Z 2020-08-17T05:05:16Z 2018 Conference Paper Xiao, H., Feng, J., Lin, G., Liu, Y. & Zhang, M. (2018). MoNet : deep motion exploitation for video object segmentation. Proceedings of the 2018 IEEE/CVF Conference o Computer Vision and Pattern Recognition (2018 CVPR). doi:10.1109/CVPR.2018.00125 978-1-5386-6421-6 https://hdl.handle.net/10356/143257 10.1109/CVPR.2018.00125 2-s2.0-85062869824 1140 1148 en © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/CVPR.2018.00125. application/pdf
spellingShingle Engineering::Computer science and engineering
Motion Segmentation
Feature Extraction
Xiao, Huaxin
Feng, Jiashi
Lin, Guosheng
Liu, Yu
Zhang, Maojun
MoNet : deep motion exploitation for video object segmentation
title MoNet : deep motion exploitation for video object segmentation
title_full MoNet : deep motion exploitation for video object segmentation
title_fullStr MoNet : deep motion exploitation for video object segmentation
title_full_unstemmed MoNet : deep motion exploitation for video object segmentation
title_short MoNet : deep motion exploitation for video object segmentation
title_sort monet deep motion exploitation for video object segmentation
topic Engineering::Computer science and engineering
Motion Segmentation
Feature Extraction
url https://hdl.handle.net/10356/143257
work_keys_str_mv AT xiaohuaxin monetdeepmotionexploitationforvideoobjectsegmentation
AT fengjiashi monetdeepmotionexploitationforvideoobjectsegmentation
AT linguosheng monetdeepmotionexploitationforvideoobjectsegmentation
AT liuyu monetdeepmotionexploitationforvideoobjectsegmentation
AT zhangmaojun monetdeepmotionexploitationforvideoobjectsegmentation