Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation

With the rapid development of flexible vision sensors and visual sensor networks, computer vision tasks, such as object detection and tracking, are entering a new phase. Accordingly, the more challenging comprehensive task, including instance segmentation, can develop rapidly. Most state-of-the-art...

Full description

Bibliographic Details
Main Authors:	Yiqing Zhang, Jun Chu, Lu Leng, Jun Miao
Format:	Article
Language:	English
Published:	MDPI AG 2020-02-01
Series:	Sensors
Subjects:	instance segmentation multi-scale feature fusion mask-refined r-cnn roialign adjustment
Online Access:	https://www.mdpi.com/1424-8220/20/4/1010

_version_	1828154828983894016
author	Yiqing Zhang Jun Chu Lu Leng Jun Miao
author_facet	Yiqing Zhang Jun Chu Lu Leng Jun Miao
author_sort	Yiqing Zhang
collection	DOAJ
description	With the rapid development of flexible vision sensors and visual sensor networks, computer vision tasks, such as object detection and tracking, are entering a new phase. Accordingly, the more challenging comprehensive task, including instance segmentation, can develop rapidly. Most state-of-the-art network frameworks, for instance, segmentation, are based on Mask R-CNN (mask region-convolutional neural network). However, the experimental results confirm that Mask R-CNN does not always successfully predict instance details. The scale-invariant fully convolutional network structure of Mask R-CNN ignores the difference in spatial information between receptive fields of different sizes. A large-scale receptive field focuses more on detailed information, whereas a small-scale receptive field focuses more on semantic information. So the network cannot consider the relationship between the pixels at the object edge, and these pixels will be misclassified. To overcome this problem, Mask-Refined R-CNN (MR R-CNN) is proposed, in which the stride of ROIAlign (region of interest align) is adjusted. In addition, the original fully convolutional layer is replaced with a new semantic segmentation layer that realizes feature fusion by constructing a feature pyramid network and summing the forward and backward transmissions of feature maps of the same resolution. The segmentation accuracy is substantially improved by combining the feature layers that focus on the global and detailed information. The experimental results on the COCO (Common Objects in Context) and Cityscapes datasets demonstrate that the segmentation accuracy of MR R-CNN is about 2% higher than that of Mask R-CNN using the same backbone. The average precision of large instances reaches 56.6%, which is higher than those of all state-of-the-art methods. In addition, the proposed method requires low time cost and is easily implemented. The experiments on the Cityscapes dataset also prove that the proposed method has great generalization ability.
first_indexed	2024-04-11T22:45:31Z
format	Article
id	doaj.art-a2acdfcc18d3430ba6d87125c29701e5
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-04-11T22:45:31Z
publishDate	2020-02-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-a2acdfcc18d3430ba6d87125c29701e52022-12-22T03:58:46ZengMDPI AGSensors1424-82202020-02-01204101010.3390/s20041010s20041010Mask-Refined R-CNN: A Network for Refining Object Details in Instance SegmentationYiqing Zhang0Jun Chu1Lu Leng2Jun Miao3Department of Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang 330063, ChinaDepartment of Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang 330063, ChinaDepartment of Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang 330063, ChinaDepartment of Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang 330063, ChinaWith the rapid development of flexible vision sensors and visual sensor networks, computer vision tasks, such as object detection and tracking, are entering a new phase. Accordingly, the more challenging comprehensive task, including instance segmentation, can develop rapidly. Most state-of-the-art network frameworks, for instance, segmentation, are based on Mask R-CNN (mask region-convolutional neural network). However, the experimental results confirm that Mask R-CNN does not always successfully predict instance details. The scale-invariant fully convolutional network structure of Mask R-CNN ignores the difference in spatial information between receptive fields of different sizes. A large-scale receptive field focuses more on detailed information, whereas a small-scale receptive field focuses more on semantic information. So the network cannot consider the relationship between the pixels at the object edge, and these pixels will be misclassified. To overcome this problem, Mask-Refined R-CNN (MR R-CNN) is proposed, in which the stride of ROIAlign (region of interest align) is adjusted. In addition, the original fully convolutional layer is replaced with a new semantic segmentation layer that realizes feature fusion by constructing a feature pyramid network and summing the forward and backward transmissions of feature maps of the same resolution. The segmentation accuracy is substantially improved by combining the feature layers that focus on the global and detailed information. The experimental results on the COCO (Common Objects in Context) and Cityscapes datasets demonstrate that the segmentation accuracy of MR R-CNN is about 2% higher than that of Mask R-CNN using the same backbone. The average precision of large instances reaches 56.6%, which is higher than those of all state-of-the-art methods. In addition, the proposed method requires low time cost and is easily implemented. The experiments on the Cityscapes dataset also prove that the proposed method has great generalization ability.https://www.mdpi.com/1424-8220/20/4/1010instance segmentationmulti-scale feature fusionmask-refined r-cnnroialign adjustment
spellingShingle	Yiqing Zhang Jun Chu Lu Leng Jun Miao Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation Sensors instance segmentation multi-scale feature fusion mask-refined r-cnn roialign adjustment
title	Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation
title_full	Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation
title_fullStr	Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation
title_full_unstemmed	Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation
title_short	Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation
title_sort	mask refined r cnn a network for refining object details in instance segmentation
topic	instance segmentation multi-scale feature fusion mask-refined r-cnn roialign adjustment
url	https://www.mdpi.com/1424-8220/20/4/1010
work_keys_str_mv	AT yiqingzhang maskrefinedrcnnanetworkforrefiningobjectdetailsininstancesegmentation AT junchu maskrefinedrcnnanetworkforrefiningobjectdetailsininstancesegmentation AT luleng maskrefinedrcnnanetworkforrefiningobjectdetailsininstancesegmentation AT junmiao maskrefinedrcnnanetworkforrefiningobjectdetailsininstancesegmentation

Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation

Similar Items