E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation

In the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolutio...

Full description

Bibliographic Details
Main Authors:	Jin-Seong Kim, Sung-Wook Park, Jun-Yeong Kim, Jun Park, Jun-Ho Huh, Se-Hoon Jung, Chun-Bo Sim
Format:	Article
Language:	English
Published:	MDPI AG 2023-08-01
Series:	Electronics
Subjects:	deep learning computer vision CNN attention
Online Access:	https://www.mdpi.com/2079-9292/12/17/3619

_version_	1797582661110202368
author	Jin-Seong Kim Sung-Wook Park Jun-Yeong Kim Jun Park Jun-Ho Huh Se-Hoon Jung Chun-Bo Sim
author_facet	Jin-Seong Kim Sung-Wook Park Jun-Yeong Kim Jun Park Jun-Ho Huh Se-Hoon Jung Chun-Bo Sim
author_sort	Jin-Seong Kim
collection	DOAJ
description	In the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolution during feature extraction. High-resolution networks (HRNets) can resolve this problem by keeping high-resolution processing layers parallel. However, information loss still occurs. Therefore, in this study, we propose an HRNet combined with an attention module to address the issue of information loss. The attention module is strategically placed immediately after each convolution to alleviate information loss by emphasizing the information retained at each stage. To achieve this, we employed a squeeze-and-excitation (SE) block as the attention module, which can seamlessly integrate into any model and enhance the performance without imposing significant parameter increases. It emphasizes the spatial and global context information by compressing and recalibrating features through global average pooling (GAP). A performance comparison between the existing HRNet model and the proposed model using various datasets show that the mean class-wise intersection over union (mIoU) and mean pixel accuracy (MeanACC) improved with the proposed model, however, there was a small increase in the number of parameters. With cityscapes dataset, MeanACC decreased by 0.1% with the proposed model compared to the baseline model, but mIoU increased by 0.5%. With the LIP dataset, the MeanACC and mIoU increased by 0.3% and 0.4%, respectively. The mIoU also decreased by 0.1% with the PASCAL Context dataset, whereas the MeanACC increased by 0.7%. Overall, the proposed model showed improved performance compared to the existing model.
first_indexed	2024-03-10T23:24:35Z
format	Article
id	doaj.art-51a6bc6aa6af4de096e6620a9ebc9a2e
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-10T23:24:35Z
publishDate	2023-08-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-51a6bc6aa6af4de096e6620a9ebc9a2e2023-11-19T08:01:45ZengMDPI AGElectronics2079-92922023-08-011217361910.3390/electronics12173619E-HRNet: Enhanced Semantic Segmentation Using Squeeze and ExcitationJin-Seong Kim0Sung-Wook Park1Jun-Yeong Kim2Jun Park3Jun-Ho Huh4Se-Hoon Jung5Chun-Bo Sim6Interdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaDepartment of Data Science, (National) Korea Maritime and Ocean University, Busan 49112, Gyeongsang-do, Republic of KoreaDepartment of Computer Engineering, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeoolanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaIn the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolution during feature extraction. High-resolution networks (HRNets) can resolve this problem by keeping high-resolution processing layers parallel. However, information loss still occurs. Therefore, in this study, we propose an HRNet combined with an attention module to address the issue of information loss. The attention module is strategically placed immediately after each convolution to alleviate information loss by emphasizing the information retained at each stage. To achieve this, we employed a squeeze-and-excitation (SE) block as the attention module, which can seamlessly integrate into any model and enhance the performance without imposing significant parameter increases. It emphasizes the spatial and global context information by compressing and recalibrating features through global average pooling (GAP). A performance comparison between the existing HRNet model and the proposed model using various datasets show that the mean class-wise intersection over union (mIoU) and mean pixel accuracy (MeanACC) improved with the proposed model, however, there was a small increase in the number of parameters. With cityscapes dataset, MeanACC decreased by 0.1% with the proposed model compared to the baseline model, but mIoU increased by 0.5%. With the LIP dataset, the MeanACC and mIoU increased by 0.3% and 0.4%, respectively. The mIoU also decreased by 0.1% with the PASCAL Context dataset, whereas the MeanACC increased by 0.7%. Overall, the proposed model showed improved performance compared to the existing model.https://www.mdpi.com/2079-9292/12/17/3619deep learningcomputer visionCNNattention
spellingShingle	Jin-Seong Kim Sung-Wook Park Jun-Yeong Kim Jun Park Jun-Ho Huh Se-Hoon Jung Chun-Bo Sim E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation Electronics deep learning computer vision CNN attention
title	E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
title_full	E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
title_fullStr	E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
title_full_unstemmed	E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
title_short	E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
title_sort	e hrnet enhanced semantic segmentation using squeeze and excitation
topic	deep learning computer vision CNN attention
url	https://www.mdpi.com/2079-9292/12/17/3619
work_keys_str_mv	AT jinseongkim ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT sungwookpark ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT junyeongkim ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT junpark ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT junhohuh ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT sehoonjung ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT chunbosim ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation

E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation

Similar Items