Semantic Segmentation by Multi-Scale Feature Extraction Based on Grouped Dilated Convolution Module

Existing studies have shown that effective extraction of multi-scale information is a crucial factor directly related to the increase in performance of semantic segmentation. Accordingly, various methods for extracting multi-scale information have been developed. However, these methods face problems...

Full description

Bibliographic Details
Main Authors: Dong Seop Kim, Yu Hwan Kim, Kang Ryoung Park
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/9/9/947
_version_ 1827694251475992576
author Dong Seop Kim
Yu Hwan Kim
Kang Ryoung Park
author_facet Dong Seop Kim
Yu Hwan Kim
Kang Ryoung Park
author_sort Dong Seop Kim
collection DOAJ
description Existing studies have shown that effective extraction of multi-scale information is a crucial factor directly related to the increase in performance of semantic segmentation. Accordingly, various methods for extracting multi-scale information have been developed. However, these methods face problems in that they require additional calculations and vast computing resources. To address these problems, this study proposes a grouped dilated convolution module that combines existing grouped convolutions and atrous spatial pyramid pooling techniques. The proposed method can learn multi-scale features more simply and effectively than existing methods. Because each convolution group has different dilations in the proposed model, they have receptive fields of different sizes and can learn features corresponding to these receptive fields. As a result, multi-scale context can be easily extracted. Moreover, optimal hyper-parameters are obtained from an in-depth analysis, and excellent segmentation performance is derived. To evaluate the proposed method, open databases of the Cambridge Driving Labeled Video Database (CamVid) and the Stanford Background Dataset (SBD) are utilized. The experimental results indicate that the proposed method shows a mean intersection over union of 73.15% based on the CamVid dataset and 72.81% based on the SBD, thereby exhibiting excellent performance compared to other state-of-the-art methods.
first_indexed 2024-03-10T12:01:30Z
format Article
id doaj.art-0ca14781bbd34d3296836b3752821f8e
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-10T12:01:30Z
publishDate 2021-04-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-0ca14781bbd34d3296836b3752821f8e2023-11-21T16:51:35ZengMDPI AGMathematics2227-73902021-04-019994710.3390/math9090947Semantic Segmentation by Multi-Scale Feature Extraction Based on Grouped Dilated Convolution ModuleDong Seop Kim0Yu Hwan Kim1Kang Ryoung Park2Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro, 1-gil, Jung-gu, Seoul 04620, KoreaDivision of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro, 1-gil, Jung-gu, Seoul 04620, KoreaDivision of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro, 1-gil, Jung-gu, Seoul 04620, KoreaExisting studies have shown that effective extraction of multi-scale information is a crucial factor directly related to the increase in performance of semantic segmentation. Accordingly, various methods for extracting multi-scale information have been developed. However, these methods face problems in that they require additional calculations and vast computing resources. To address these problems, this study proposes a grouped dilated convolution module that combines existing grouped convolutions and atrous spatial pyramid pooling techniques. The proposed method can learn multi-scale features more simply and effectively than existing methods. Because each convolution group has different dilations in the proposed model, they have receptive fields of different sizes and can learn features corresponding to these receptive fields. As a result, multi-scale context can be easily extracted. Moreover, optimal hyper-parameters are obtained from an in-depth analysis, and excellent segmentation performance is derived. To evaluate the proposed method, open databases of the Cambridge Driving Labeled Video Database (CamVid) and the Stanford Background Dataset (SBD) are utilized. The experimental results indicate that the proposed method shows a mean intersection over union of 73.15% based on the CamVid dataset and 72.81% based on the SBD, thereby exhibiting excellent performance compared to other state-of-the-art methods.https://www.mdpi.com/2227-7390/9/9/947semantic segmentationpixel-level classificationgrouped dilated convolution modulemulti-scale context
spellingShingle Dong Seop Kim
Yu Hwan Kim
Kang Ryoung Park
Semantic Segmentation by Multi-Scale Feature Extraction Based on Grouped Dilated Convolution Module
Mathematics
semantic segmentation
pixel-level classification
grouped dilated convolution module
multi-scale context
title Semantic Segmentation by Multi-Scale Feature Extraction Based on Grouped Dilated Convolution Module
title_full Semantic Segmentation by Multi-Scale Feature Extraction Based on Grouped Dilated Convolution Module
title_fullStr Semantic Segmentation by Multi-Scale Feature Extraction Based on Grouped Dilated Convolution Module
title_full_unstemmed Semantic Segmentation by Multi-Scale Feature Extraction Based on Grouped Dilated Convolution Module
title_short Semantic Segmentation by Multi-Scale Feature Extraction Based on Grouped Dilated Convolution Module
title_sort semantic segmentation by multi scale feature extraction based on grouped dilated convolution module
topic semantic segmentation
pixel-level classification
grouped dilated convolution module
multi-scale context
url https://www.mdpi.com/2227-7390/9/9/947
work_keys_str_mv AT dongseopkim semanticsegmentationbymultiscalefeatureextractionbasedongroupeddilatedconvolutionmodule
AT yuhwankim semanticsegmentationbymultiscalefeatureextractionbasedongroupeddilatedconvolutionmodule
AT kangryoungpark semanticsegmentationbymultiscalefeatureextractionbasedongroupeddilatedconvolutionmodule