CPISNet: Delving into Consistent Proposals of Instance Segmentation Network for High-Resolution Aerial Images

Instance segmentation of high-resolution aerial images is challenging when compared to object detection and semantic segmentation in remote sensing applications. It adopts boundary-aware mask predictions, instead of traditional bounding boxes, to locate the objects-of-interest in pixel-wise. Meanwhi...

Full description

Bibliographic Details
Main Authors: Xiangfeng Zeng, Shunjun Wei, Jinshan Wei, Zichen Zhou, Jun Shi, Xiaoling Zhang, Fan Fan
Format: Article
Language:English
Published: MDPI AG 2021-07-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/13/14/2788
Description
Summary:Instance segmentation of high-resolution aerial images is challenging when compared to object detection and semantic segmentation in remote sensing applications. It adopts boundary-aware mask predictions, instead of traditional bounding boxes, to locate the objects-of-interest in pixel-wise. Meanwhile, instance segmentation can distinguish the densely distributed objects within a certain category by a different color, which is unavailable in semantic segmentation. Despite the distinct advantages, there are rare methods which are dedicated to the high-quality instance segmentation for high-resolution aerial images. In this paper, a novel instance segmentation method, termed consistent proposals of instance segmentation network (CPISNet), for high-resolution aerial images is proposed. Following top-down instance segmentation formula, it adopts the adaptive feature extraction network (AFEN) to extract the multi-level bottom-up augmented feature maps in design space level. Then, elaborated RoI extractor (ERoIE) is designed to extract the mask RoIs via the refined bounding boxes from proposal consistent cascaded (PCC) architecture and multi-level features from AFEN. Finally, the convolution block with shortcut connection is responsible for generating the binary mask for instance segmentation. Experimental conclusions can be drawn on the iSAID and NWPU VHR-10 instance segmentation dataset: (1) Each individual module in CPISNet acts on the whole instance segmentation utility; (2) CPISNet* exceeds vanilla Mask R-CNN 3.4%/3.8% <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>A</mi><mi>P</mi></mrow></semantics></math></inline-formula> on iSAID validation/test set and 9.2% <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>A</mi><mi>P</mi></mrow></semantics></math></inline-formula> on NWPU VHR-10 instance segmentation dataset; (3) The aliasing masks, missing segmentations, false alarms, and poorly segmented masks can be avoided to some extent for CPISNet; (4) CPISNet receives high precision of instance segmentation for aerial images and interprets the objects with fitting boundary.
ISSN:2072-4292