Video object segmentation via attention‐modulating networks
This Letter presents an attention‐modulating network for video object segmentation that can well adapt its segmentation model to the annotated frame. Specifically, the authors first develop an efficient visual and spatial attention modulator to fast modulate the segmentation model to focus on the sp...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2019-04-01
|
Series: | Electronics Letters |
Subjects: | |
Online Access: | https://doi.org/10.1049/el.2019.0304 |
_version_ | 1818934701079920640 |
---|---|
author | Runfa Tang Huihui Song Kaihua Zhang Sihao Jiang |
author_facet | Runfa Tang Huihui Song Kaihua Zhang Sihao Jiang |
author_sort | Runfa Tang |
collection | DOAJ |
description | This Letter presents an attention‐modulating network for video object segmentation that can well adapt its segmentation model to the annotated frame. Specifically, the authors first develop an efficient visual and spatial attention modulator to fast modulate the segmentation model to focus on the specific object of interest. Then they design a channel and spatial attention module and inject it into the segmentation model to further refine its feature maps. In addition, to fuse multi‐scale context information, they construct a feature pyramid attention module to further process the top layer feature maps, achieving better pixel‐level attention for the high‐level feature maps. Finally, to address the sample imbalance issue in training, they employ focal loss that can distinguish simple samples from the difficult ones to accelerate the convergence of network training. Extensive evaluations on DAVIS2017 dataset show that the proposed approach has achieved state‐of‐the‐art performance, outperforming the baseline OSMN by 3.6 and 5.4% in terms of IoU and F‐measure without fine‐tuning. |
first_indexed | 2024-12-20T05:08:27Z |
format | Article |
id | doaj.art-0d2128120a9945c4804da621bb97c656 |
institution | Directory Open Access Journal |
issn | 0013-5194 1350-911X |
language | English |
last_indexed | 2024-12-20T05:08:27Z |
publishDate | 2019-04-01 |
publisher | Wiley |
record_format | Article |
series | Electronics Letters |
spelling | doaj.art-0d2128120a9945c4804da621bb97c6562022-12-21T19:52:18ZengWileyElectronics Letters0013-51941350-911X2019-04-0155845545710.1049/el.2019.0304Video object segmentation via attention‐modulating networksRunfa Tang0Huihui Song1Kaihua Zhang2Sihao Jiang3Jiangsu Key Laboratory of Big Data Analysis Technology (B‐DAT) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET)Nanjing University of Information Science and TechnologyNanjingPeople's Republic of ChinaJiangsu Key Laboratory of Big Data Analysis Technology (B‐DAT) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET)Nanjing University of Information Science and TechnologyNanjingPeople's Republic of ChinaJiangsu Key Laboratory of Big Data Analysis Technology (B‐DAT) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET)Nanjing University of Information Science and TechnologyNanjingPeople's Republic of ChinaJiangsu Key Laboratory of Big Data Analysis Technology (B‐DAT) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET)Nanjing University of Information Science and TechnologyNanjingPeople's Republic of ChinaThis Letter presents an attention‐modulating network for video object segmentation that can well adapt its segmentation model to the annotated frame. Specifically, the authors first develop an efficient visual and spatial attention modulator to fast modulate the segmentation model to focus on the specific object of interest. Then they design a channel and spatial attention module and inject it into the segmentation model to further refine its feature maps. In addition, to fuse multi‐scale context information, they construct a feature pyramid attention module to further process the top layer feature maps, achieving better pixel‐level attention for the high‐level feature maps. Finally, to address the sample imbalance issue in training, they employ focal loss that can distinguish simple samples from the difficult ones to accelerate the convergence of network training. Extensive evaluations on DAVIS2017 dataset show that the proposed approach has achieved state‐of‐the‐art performance, outperforming the baseline OSMN by 3.6 and 5.4% in terms of IoU and F‐measure without fine‐tuning.https://doi.org/10.1049/el.2019.0304video object segmentationattention‐modulating networksegmentation modelspatial attention modulatorspatial attention modulefeature pyramid attention module |
spellingShingle | Runfa Tang Huihui Song Kaihua Zhang Sihao Jiang Video object segmentation via attention‐modulating networks Electronics Letters video object segmentation attention‐modulating network segmentation model spatial attention modulator spatial attention module feature pyramid attention module |
title | Video object segmentation via attention‐modulating networks |
title_full | Video object segmentation via attention‐modulating networks |
title_fullStr | Video object segmentation via attention‐modulating networks |
title_full_unstemmed | Video object segmentation via attention‐modulating networks |
title_short | Video object segmentation via attention‐modulating networks |
title_sort | video object segmentation via attention modulating networks |
topic | video object segmentation attention‐modulating network segmentation model spatial attention modulator spatial attention module feature pyramid attention module |
url | https://doi.org/10.1049/el.2019.0304 |
work_keys_str_mv | AT runfatang videoobjectsegmentationviaattentionmodulatingnetworks AT huihuisong videoobjectsegmentationviaattentionmodulatingnetworks AT kaihuazhang videoobjectsegmentationviaattentionmodulatingnetworks AT sihaojiang videoobjectsegmentationviaattentionmodulatingnetworks |