E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation
In the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolutio...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-08-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/17/3619 |
_version_ | 1797582661110202368 |
---|---|
author | Jin-Seong Kim Sung-Wook Park Jun-Yeong Kim Jun Park Jun-Ho Huh Se-Hoon Jung Chun-Bo Sim |
author_facet | Jin-Seong Kim Sung-Wook Park Jun-Yeong Kim Jun Park Jun-Ho Huh Se-Hoon Jung Chun-Bo Sim |
author_sort | Jin-Seong Kim |
collection | DOAJ |
description | In the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolution during feature extraction. High-resolution networks (HRNets) can resolve this problem by keeping high-resolution processing layers parallel. However, information loss still occurs. Therefore, in this study, we propose an HRNet combined with an attention module to address the issue of information loss. The attention module is strategically placed immediately after each convolution to alleviate information loss by emphasizing the information retained at each stage. To achieve this, we employed a squeeze-and-excitation (SE) block as the attention module, which can seamlessly integrate into any model and enhance the performance without imposing significant parameter increases. It emphasizes the spatial and global context information by compressing and recalibrating features through global average pooling (GAP). A performance comparison between the existing HRNet model and the proposed model using various datasets show that the mean class-wise intersection over union (mIoU) and mean pixel accuracy (MeanACC) improved with the proposed model, however, there was a small increase in the number of parameters. With cityscapes dataset, MeanACC decreased by 0.1% with the proposed model compared to the baseline model, but mIoU increased by 0.5%. With the LIP dataset, the MeanACC and mIoU increased by 0.3% and 0.4%, respectively. The mIoU also decreased by 0.1% with the PASCAL Context dataset, whereas the MeanACC increased by 0.7%. Overall, the proposed model showed improved performance compared to the existing model. |
first_indexed | 2024-03-10T23:24:35Z |
format | Article |
id | doaj.art-51a6bc6aa6af4de096e6620a9ebc9a2e |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T23:24:35Z |
publishDate | 2023-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-51a6bc6aa6af4de096e6620a9ebc9a2e2023-11-19T08:01:45ZengMDPI AGElectronics2079-92922023-08-011217361910.3390/electronics12173619E-HRNet: Enhanced Semantic Segmentation Using Squeeze and ExcitationJin-Seong Kim0Sung-Wook Park1Jun-Yeong Kim2Jun Park3Jun-Ho Huh4Se-Hoon Jung5Chun-Bo Sim6Interdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaDepartment of Data Science, (National) Korea Maritime and Ocean University, Busan 49112, Gyeongsang-do, Republic of KoreaDepartment of Computer Engineering, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeoolanam-do, Republic of KoreaInterdisciplinary Program IT-Bio Convergence System, Sunchon National University, 255 Jungang-ro, Suncheon-city 57922, Jeollanam-do, Republic of KoreaIn the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolution during feature extraction. High-resolution networks (HRNets) can resolve this problem by keeping high-resolution processing layers parallel. However, information loss still occurs. Therefore, in this study, we propose an HRNet combined with an attention module to address the issue of information loss. The attention module is strategically placed immediately after each convolution to alleviate information loss by emphasizing the information retained at each stage. To achieve this, we employed a squeeze-and-excitation (SE) block as the attention module, which can seamlessly integrate into any model and enhance the performance without imposing significant parameter increases. It emphasizes the spatial and global context information by compressing and recalibrating features through global average pooling (GAP). A performance comparison between the existing HRNet model and the proposed model using various datasets show that the mean class-wise intersection over union (mIoU) and mean pixel accuracy (MeanACC) improved with the proposed model, however, there was a small increase in the number of parameters. With cityscapes dataset, MeanACC decreased by 0.1% with the proposed model compared to the baseline model, but mIoU increased by 0.5%. With the LIP dataset, the MeanACC and mIoU increased by 0.3% and 0.4%, respectively. The mIoU also decreased by 0.1% with the PASCAL Context dataset, whereas the MeanACC increased by 0.7%. Overall, the proposed model showed improved performance compared to the existing model.https://www.mdpi.com/2079-9292/12/17/3619deep learningcomputer visionCNNattention |
spellingShingle | Jin-Seong Kim Sung-Wook Park Jun-Yeong Kim Jun Park Jun-Ho Huh Se-Hoon Jung Chun-Bo Sim E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation Electronics deep learning computer vision CNN attention |
title | E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation |
title_full | E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation |
title_fullStr | E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation |
title_full_unstemmed | E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation |
title_short | E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation |
title_sort | e hrnet enhanced semantic segmentation using squeeze and excitation |
topic | deep learning computer vision CNN attention |
url | https://www.mdpi.com/2079-9292/12/17/3619 |
work_keys_str_mv | AT jinseongkim ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT sungwookpark ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT junyeongkim ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT junpark ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT junhohuh ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT sehoonjung ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation AT chunbosim ehrnetenhancedsemanticsegmentationusingsqueezeandexcitation |