Video object segmentation via attention‐modulating networks

This Letter presents an attention‐modulating network for video object segmentation that can well adapt its segmentation model to the annotated frame. Specifically, the authors first develop an efficient visual and spatial attention modulator to fast modulate the segmentation model to focus on the sp...

Full description

Bibliographic Details
Main Authors: Runfa Tang, Huihui Song, Kaihua Zhang, Sihao Jiang
Format: Article
Language:English
Published: Wiley 2019-04-01
Series:Electronics Letters
Subjects:
Online Access:https://doi.org/10.1049/el.2019.0304
_version_ 1818934701079920640
author Runfa Tang
Huihui Song
Kaihua Zhang
Sihao Jiang
author_facet Runfa Tang
Huihui Song
Kaihua Zhang
Sihao Jiang
author_sort Runfa Tang
collection DOAJ
description This Letter presents an attention‐modulating network for video object segmentation that can well adapt its segmentation model to the annotated frame. Specifically, the authors first develop an efficient visual and spatial attention modulator to fast modulate the segmentation model to focus on the specific object of interest. Then they design a channel and spatial attention module and inject it into the segmentation model to further refine its feature maps. In addition, to fuse multi‐scale context information, they construct a feature pyramid attention module to further process the top layer feature maps, achieving better pixel‐level attention for the high‐level feature maps. Finally, to address the sample imbalance issue in training, they employ focal loss that can distinguish simple samples from the difficult ones to accelerate the convergence of network training. Extensive evaluations on DAVIS2017 dataset show that the proposed approach has achieved state‐of‐the‐art performance, outperforming the baseline OSMN by 3.6 and 5.4% in terms of IoU and F‐measure without fine‐tuning.
first_indexed 2024-12-20T05:08:27Z
format Article
id doaj.art-0d2128120a9945c4804da621bb97c656
institution Directory Open Access Journal
issn 0013-5194
1350-911X
language English
last_indexed 2024-12-20T05:08:27Z
publishDate 2019-04-01
publisher Wiley
record_format Article
series Electronics Letters
spelling doaj.art-0d2128120a9945c4804da621bb97c6562022-12-21T19:52:18ZengWileyElectronics Letters0013-51941350-911X2019-04-0155845545710.1049/el.2019.0304Video object segmentation via attention‐modulating networksRunfa Tang0Huihui Song1Kaihua Zhang2Sihao Jiang3Jiangsu Key Laboratory of Big Data Analysis Technology (B‐DAT) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET)Nanjing University of Information Science and TechnologyNanjingPeople's Republic of ChinaJiangsu Key Laboratory of Big Data Analysis Technology (B‐DAT) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET)Nanjing University of Information Science and TechnologyNanjingPeople's Republic of ChinaJiangsu Key Laboratory of Big Data Analysis Technology (B‐DAT) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET)Nanjing University of Information Science and TechnologyNanjingPeople's Republic of ChinaJiangsu Key Laboratory of Big Data Analysis Technology (B‐DAT) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET)Nanjing University of Information Science and TechnologyNanjingPeople's Republic of ChinaThis Letter presents an attention‐modulating network for video object segmentation that can well adapt its segmentation model to the annotated frame. Specifically, the authors first develop an efficient visual and spatial attention modulator to fast modulate the segmentation model to focus on the specific object of interest. Then they design a channel and spatial attention module and inject it into the segmentation model to further refine its feature maps. In addition, to fuse multi‐scale context information, they construct a feature pyramid attention module to further process the top layer feature maps, achieving better pixel‐level attention for the high‐level feature maps. Finally, to address the sample imbalance issue in training, they employ focal loss that can distinguish simple samples from the difficult ones to accelerate the convergence of network training. Extensive evaluations on DAVIS2017 dataset show that the proposed approach has achieved state‐of‐the‐art performance, outperforming the baseline OSMN by 3.6 and 5.4% in terms of IoU and F‐measure without fine‐tuning.https://doi.org/10.1049/el.2019.0304video object segmentationattention‐modulating networksegmentation modelspatial attention modulatorspatial attention modulefeature pyramid attention module
spellingShingle Runfa Tang
Huihui Song
Kaihua Zhang
Sihao Jiang
Video object segmentation via attention‐modulating networks
Electronics Letters
video object segmentation
attention‐modulating network
segmentation model
spatial attention modulator
spatial attention module
feature pyramid attention module
title Video object segmentation via attention‐modulating networks
title_full Video object segmentation via attention‐modulating networks
title_fullStr Video object segmentation via attention‐modulating networks
title_full_unstemmed Video object segmentation via attention‐modulating networks
title_short Video object segmentation via attention‐modulating networks
title_sort video object segmentation via attention modulating networks
topic video object segmentation
attention‐modulating network
segmentation model
spatial attention modulator
spatial attention module
feature pyramid attention module
url https://doi.org/10.1049/el.2019.0304
work_keys_str_mv AT runfatang videoobjectsegmentationviaattentionmodulatingnetworks
AT huihuisong videoobjectsegmentationviaattentionmodulatingnetworks
AT kaihuazhang videoobjectsegmentationviaattentionmodulatingnetworks
AT sihaojiang videoobjectsegmentationviaattentionmodulatingnetworks