E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation

In the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolutio...

Full description

Bibliographic Details
Main Authors: Jin-Seong Kim, Sung-Wook Park, Jun-Yeong Kim, Jun Park, Jun-Ho Huh, Se-Hoon Jung, Chun-Bo Sim
Format: Article
Language:English
Published: MDPI AG 2023-08-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/17/3619
_version_ 1797582661110202368
author Jin-Seong Kim
Sung-Wook Park
Jun-Yeong Kim
Jun Park
Jun-Ho Huh
Se-Hoon Jung
Chun-Bo Sim
author_facet Jin-Seong Kim
Sung-Wook Park
Jun-Yeong Kim
Jun Park
Jun-Ho Huh
Se-Hoon Jung
Chun-Bo Sim
author_sort Jin-Seong Kim
collection DOAJ
description In the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolution during feature extraction. High-resolution networks (HRNets) can resolve this problem by keeping high-resolution processing layers parallel. However, information loss still occurs. Therefore, in this study, we propose an HRNet combined with an attention module to address the issue of information loss. The attention module is strategically placed immediately after each convolution to alleviate information loss by emphasizing the information retained at each stage. To achieve this, we employed a squeeze-and-excitation (SE) block as the attention module, which can seamlessly integrate into any model and enhance the performance without imposing significant parameter increases. It emphasizes the spatial and global context information by compressing and recalibrating features through global average pooling (GAP). A performance comparison between the existing HRNet model and the proposed model using various datasets show that the mean class-wise intersection over union (mIoU) and mean pixel accuracy (MeanACC) improved with the proposed model, however, there was a small increase in the number of parameters. With cityscapes dataset, MeanACC decreased by 0.1% with the proposed model compared to the baseline model, but mIoU increased by 0.5%. With the LIP dataset, the MeanACC and mIoU increased by 0.3% and 0.4%, respectively. The mIoU also decreased by 0.1% with the PASCAL Context dataset, whereas the MeanACC increased by 0.7%. Overall, the proposed model showed improved performance compared to the existing model.
first_indexed 2024-03-10T23:24:35Z
format Article
id doaj.art-51a6bc6aa6af4de096e6620a9ebc9a2e
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T23:24:35Z
publishDate 2023-08-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-51a6bc6aa6af4de096e6620a9ebc9a2e2023-11-19T08:01:45ZengMDPI AGElectronics2079-92922023-08-011217361910.3390/electronics12173619E-HRNet: Enhanced Semantic Segmentation Using Squeeze and ExcitationJin-Seong Kim0Sung-Wook Park1Jun-Yeong Kim2Jun Park3Jun-Ho Huh4Se-Hoon Jung5Chun-Bo Sim6Interdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaDepartment of Data Science, (National) Korea Maritime and Ocean University, Busan 49112, Gyeongsang-do, Republic of KoreaDepartment of Computer Engineering, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeoolanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaIn the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolution during feature extraction. High-resolution networks (HRNets) can resolve this problem by keeping high-resolution processing layers parallel. However, information loss still occurs. Therefore, in this study, we propose an HRNet combined with an attention module to address the issue of information loss. The attention module is strategically placed immediately after each convolution to alleviate information loss by emphasizing the information retained at each stage. To achieve this, we employed a squeeze-and-excitation (SE) block as the attention module, which can seamlessly integrate into any model and enhance the performance without imposing significant parameter increases. It emphasizes the spatial and global context information by compressing and recalibrating features through global average pooling (GAP). A performance comparison between the existing HRNet model and the proposed model using various datasets show that the mean class-wise intersection over union (mIoU) and mean pixel accuracy (MeanACC) improved with the proposed model, however, there was a small increase in the number of parameters. With cityscapes dataset, MeanACC decreased by 0.1% with the proposed model compared to the baseline model, but mIoU increased by 0.5%. With the LIP dataset, the MeanACC and mIoU increased by 0.3% and 0.4%, respectively. The mIoU also decreased by 0.1% with the PASCAL Context dataset, whereas the MeanACC increased by 0.7%. Overall, the proposed model showed improved performance compared to the existing model.https://www.mdpi.com/2079-9292/12/17/3619deep learningcomputer visionCNNattention
spellingShingle Jin-Seong Kim
Sung-Wook Park
Jun-Yeong Kim
Jun Park
Jun-Ho Huh
Se-Hoon Jung
Chun-Bo Sim
E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
Electronics
deep learning
computer vision
CNN
attention
title E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
title_full E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
title_fullStr E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
title_full_unstemmed E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
title_short E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
title_sort e hrnet enhanced semantic segmentation using squeeze and excitation
topic deep learning
computer vision
CNN
attention
url https://www.mdpi.com/2079-9292/12/17/3619
work_keys_str_mv AT jinseongkim ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation
AT sungwookpark ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation
AT junyeongkim ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation
AT junpark ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation
AT junhohuh ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation
AT sehoonjung ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation
AT chunbosim ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation