An encoder‐decoder framework with dynamic convolution for weakly supervised instance segmentation

Abstract In the systems of industrial robotics and autonomous vehicles, instance segmentation is widely employed. However, manually labelling an object outline is time‐consuming. In order to reduce annotation costs, we present a weakly supervised instance segmentation method in this article. A deepl...

Full description

Bibliographic Details
Main Authors: Liangjun Zhu, Li Peng, Shuchen Ding, Zhongren Liu
Format: Article
Language:English
Published: Wiley 2023-12-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/cvi2.12202
_version_ 1797387987177177088
author Liangjun Zhu
Li Peng
Shuchen Ding
Zhongren Liu
author_facet Liangjun Zhu
Li Peng
Shuchen Ding
Zhongren Liu
author_sort Liangjun Zhu
collection DOAJ
description Abstract In the systems of industrial robotics and autonomous vehicles, instance segmentation is widely employed. However, manually labelling an object outline is time‐consuming. In order to reduce annotation costs, we present a weakly supervised instance segmentation method in this article. A deeply convolutional network is first used to construct multi‐scale feature maps for each object in the input image. After that, the encoder‐decoder framework with dynamic convolution is utilised to enhance model capacity and efficiency, while avoiding the issues of anchor design, proposal selection, and RoIAlign implementation. In particular, Dynamic Heads are used in the encoder to create dynamic convolution kernels, while Instance Heads are used in the decoder to provide the global feature map. With dynamic convolution, each instance can be segmented independently, reducing interference with other instances and improving segmentation accuracy. Under the supervision of projection loss and pixel point colour pairing loss, the contours of each object are finally outlined. On the PASCAL VOC and MS COCO datasets, the proposed method is competitive with more sophisticated approaches. In the VOC dataset, segmentation performance achieved 37.6% average precision with ResNet‐101 and FPN networks. The extensively visualised results demonstrate the effectiveness of the proposed encoder‐decoder framework with dynamic convolution.
first_indexed 2024-03-08T22:33:12Z
format Article
id doaj.art-39e987ce06ba413b898b8575375656b1
institution Directory Open Access Journal
issn 1751-9632
1751-9640
language English
last_indexed 2024-03-08T22:33:12Z
publishDate 2023-12-01
publisher Wiley
record_format Article
series IET Computer Vision
spelling doaj.art-39e987ce06ba413b898b8575375656b12023-12-17T15:35:00ZengWileyIET Computer Vision1751-96321751-96402023-12-0117888389410.1049/cvi2.12202An encoder‐decoder framework with dynamic convolution for weakly supervised instance segmentationLiangjun Zhu0Li Peng1Shuchen Ding2Zhongren Liu3Engineering Research Center of Internet of Things Applied Technology Jiangnan University Wuxi ChinaEngineering Research Center of Internet of Things Applied Technology Jiangnan University Wuxi ChinaSchool of Electronics and Information Engineering Suzhou University of Science and Technology Suzhou ChinaEngineering Research Center of Internet of Things Applied Technology Jiangnan University Wuxi ChinaAbstract In the systems of industrial robotics and autonomous vehicles, instance segmentation is widely employed. However, manually labelling an object outline is time‐consuming. In order to reduce annotation costs, we present a weakly supervised instance segmentation method in this article. A deeply convolutional network is first used to construct multi‐scale feature maps for each object in the input image. After that, the encoder‐decoder framework with dynamic convolution is utilised to enhance model capacity and efficiency, while avoiding the issues of anchor design, proposal selection, and RoIAlign implementation. In particular, Dynamic Heads are used in the encoder to create dynamic convolution kernels, while Instance Heads are used in the decoder to provide the global feature map. With dynamic convolution, each instance can be segmented independently, reducing interference with other instances and improving segmentation accuracy. Under the supervision of projection loss and pixel point colour pairing loss, the contours of each object are finally outlined. On the PASCAL VOC and MS COCO datasets, the proposed method is competitive with more sophisticated approaches. In the VOC dataset, segmentation performance achieved 37.6% average precision with ResNet‐101 and FPN networks. The extensively visualised results demonstrate the effectiveness of the proposed encoder‐decoder framework with dynamic convolution.https://doi.org/10.1049/cvi2.12202image segmentationobject detection
spellingShingle Liangjun Zhu
Li Peng
Shuchen Ding
Zhongren Liu
An encoder‐decoder framework with dynamic convolution for weakly supervised instance segmentation
IET Computer Vision
image segmentation
object detection
title An encoder‐decoder framework with dynamic convolution for weakly supervised instance segmentation
title_full An encoder‐decoder framework with dynamic convolution for weakly supervised instance segmentation
title_fullStr An encoder‐decoder framework with dynamic convolution for weakly supervised instance segmentation
title_full_unstemmed An encoder‐decoder framework with dynamic convolution for weakly supervised instance segmentation
title_short An encoder‐decoder framework with dynamic convolution for weakly supervised instance segmentation
title_sort encoder decoder framework with dynamic convolution for weakly supervised instance segmentation
topic image segmentation
object detection
url https://doi.org/10.1049/cvi2.12202
work_keys_str_mv AT liangjunzhu anencoderdecoderframeworkwithdynamicconvolutionforweaklysupervisedinstancesegmentation
AT lipeng anencoderdecoderframeworkwithdynamicconvolutionforweaklysupervisedinstancesegmentation
AT shuchending anencoderdecoderframeworkwithdynamicconvolutionforweaklysupervisedinstancesegmentation
AT zhongrenliu anencoderdecoderframeworkwithdynamicconvolutionforweaklysupervisedinstancesegmentation
AT liangjunzhu encoderdecoderframeworkwithdynamicconvolutionforweaklysupervisedinstancesegmentation
AT lipeng encoderdecoderframeworkwithdynamicconvolutionforweaklysupervisedinstancesegmentation
AT shuchending encoderdecoderframeworkwithdynamicconvolutionforweaklysupervisedinstancesegmentation
AT zhongrenliu encoderdecoderframeworkwithdynamicconvolutionforweaklysupervisedinstancesegmentation