EAR-Net: Efficient Atrous Residual Network for Semantic Segmentation of Street Scenes Based on Deep Learning

Segmentation of street scenes is a key technology in the field of autonomous vehicles. However, conventional segmentation methods achieve low accuracy because of the complexity of street landscapes. Therefore, we propose an efficient atrous residual network (EAR-Net) to improve accuracy while mainta...

Full description

Bibliographic Details
Main Authors: Seokyong Shin, Sanghun Lee, Hyunho Han
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/19/9119
_version_ 1797516777370943488
author Seokyong Shin
Sanghun Lee
Hyunho Han
author_facet Seokyong Shin
Sanghun Lee
Hyunho Han
author_sort Seokyong Shin
collection DOAJ
description Segmentation of street scenes is a key technology in the field of autonomous vehicles. However, conventional segmentation methods achieve low accuracy because of the complexity of street landscapes. Therefore, we propose an efficient atrous residual network (EAR-Net) to improve accuracy while maintaining computation costs. First, we performed feature extraction and restoration, utilizing depthwise separable convolution (DSConv) and interpolation. Compared with conventional methods, DSConv and interpolation significantly reduce computation costs while minimizing performance degradation. Second, we utilized residual learning and atrous spatial pyramid pooling (ASPP) to achieve high accuracy. Residual learning increases the ability to extract context information by preventing the problem of feature and gradient losses. In addition, ASPP extracts additional context information while maintaining the resolution of the feature map. Finally, to alleviate the class imbalance between the image background and objects and to improve learning efficiency, we utilized focal loss. We evaluated EAR-Net on the Cityscapes dataset, which is commonly used for street scene segmentation studies. Experimental results showed that the EAR-Net had better segmentation results and similar computation costs as the conventional methods. We also conducted an ablation study to analyze the contributions of the ASPP and DSConv in the EAR-Net.
first_indexed 2024-03-10T07:05:38Z
format Article
id doaj.art-470ee410c52345fc873593c9cd4bc2d2
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T07:05:38Z
publishDate 2021-09-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-470ee410c52345fc873593c9cd4bc2d22023-11-22T15:47:58ZengMDPI AGApplied Sciences2076-34172021-09-011119911910.3390/app11199119EAR-Net: Efficient Atrous Residual Network for Semantic Segmentation of Street Scenes Based on Deep LearningSeokyong Shin0Sanghun Lee1Hyunho Han2Department of Plasma Bio Display, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, KoreaIngenium College of Liberal Arts, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, KoreaCollege of General Education, University of Ulsan, 93 Daehak-ro, Nam-gu, Ulsan 44610, KoreaSegmentation of street scenes is a key technology in the field of autonomous vehicles. However, conventional segmentation methods achieve low accuracy because of the complexity of street landscapes. Therefore, we propose an efficient atrous residual network (EAR-Net) to improve accuracy while maintaining computation costs. First, we performed feature extraction and restoration, utilizing depthwise separable convolution (DSConv) and interpolation. Compared with conventional methods, DSConv and interpolation significantly reduce computation costs while minimizing performance degradation. Second, we utilized residual learning and atrous spatial pyramid pooling (ASPP) to achieve high accuracy. Residual learning increases the ability to extract context information by preventing the problem of feature and gradient losses. In addition, ASPP extracts additional context information while maintaining the resolution of the feature map. Finally, to alleviate the class imbalance between the image background and objects and to improve learning efficiency, we utilized focal loss. We evaluated EAR-Net on the Cityscapes dataset, which is commonly used for street scene segmentation studies. Experimental results showed that the EAR-Net had better segmentation results and similar computation costs as the conventional methods. We also conducted an ablation study to analyze the contributions of the ASPP and DSConv in the EAR-Net.https://www.mdpi.com/2076-3417/11/19/9119atrous spatial pyramid poolingdeep learningencoder–decoderresidual learningsemantic segmentation
spellingShingle Seokyong Shin
Sanghun Lee
Hyunho Han
EAR-Net: Efficient Atrous Residual Network for Semantic Segmentation of Street Scenes Based on Deep Learning
Applied Sciences
atrous spatial pyramid pooling
deep learning
encoder–decoder
residual learning
semantic segmentation
title EAR-Net: Efficient Atrous Residual Network for Semantic Segmentation of Street Scenes Based on Deep Learning
title_full EAR-Net: Efficient Atrous Residual Network for Semantic Segmentation of Street Scenes Based on Deep Learning
title_fullStr EAR-Net: Efficient Atrous Residual Network for Semantic Segmentation of Street Scenes Based on Deep Learning
title_full_unstemmed EAR-Net: Efficient Atrous Residual Network for Semantic Segmentation of Street Scenes Based on Deep Learning
title_short EAR-Net: Efficient Atrous Residual Network for Semantic Segmentation of Street Scenes Based on Deep Learning
title_sort ear net efficient atrous residual network for semantic segmentation of street scenes based on deep learning
topic atrous spatial pyramid pooling
deep learning
encoder–decoder
residual learning
semantic segmentation
url https://www.mdpi.com/2076-3417/11/19/9119
work_keys_str_mv AT seokyongshin earnetefficientatrousresidualnetworkforsemanticsegmentationofstreetscenesbasedondeeplearning
AT sanghunlee earnetefficientatrousresidualnetworkforsemanticsegmentationofstreetscenesbasedondeeplearning
AT hyunhohan earnetefficientatrousresidualnetworkforsemanticsegmentationofstreetscenesbasedondeeplearning