Lite‐weight semantic segmentation with AG self‐attention

Abstract Due to the large computational and GPUs memory cost of semantic segmentation, some works focus on designing a lite weight model to achieve a good trade‐off between computational cost and accuracy. A common method is to combined CNN and vision transformer. However, these methods ignore the c...

Full description

Bibliographic Details
Main Authors: Bing Liu, Yansheng Gao, Hai Li, Zhaohao Zhong, Hongwei Zhao
Format: Article
Language:English
Published: Wiley 2024-02-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/cvi2.12225
_version_ 1797320622312783872
author Bing Liu
Yansheng Gao
Hai Li
Zhaohao Zhong
Hongwei Zhao
author_facet Bing Liu
Yansheng Gao
Hai Li
Zhaohao Zhong
Hongwei Zhao
author_sort Bing Liu
collection DOAJ
description Abstract Due to the large computational and GPUs memory cost of semantic segmentation, some works focus on designing a lite weight model to achieve a good trade‐off between computational cost and accuracy. A common method is to combined CNN and vision transformer. However, these methods ignore the contextual information of multi receptive fields. And existing methods often fail to inject detailed information losses in the downsampling of multi‐scale feature. To fix these issues, we propose AG Self‐Attention, which is Enhanced Atrous Self‐Attention (EASA), and Gate Attention. AG Self‐Attention adds the contextual information of multi receptive fields into the global semantic feature. Specifically, the Enhanced Atrous Self‐Attention uses weight shared atrous convolution with different atrous rates to get the contextual information under the specific different receptive fields. Gate Attention introduces gating mechanism to inject detailed information into the global semantic feature and filter detailed information by producing “fusion” gate and “update” gate. In order to prove our insight. We conduct numerous experiments in common semantic segmentation datasets, consisting of ADE20 K, COCO‐stuff, PASCAL Context, Cityscapes, to show that our method achieves state‐of‐the‐art performance and achieve a good trade‐off between computational cost and accuracy.
first_indexed 2024-03-08T04:45:39Z
format Article
id doaj.art-43dc2eb0e67240cca983f6a978572ab0
institution Directory Open Access Journal
issn 1751-9632
1751-9640
language English
last_indexed 2024-03-08T04:45:39Z
publishDate 2024-02-01
publisher Wiley
record_format Article
series IET Computer Vision
spelling doaj.art-43dc2eb0e67240cca983f6a978572ab02024-02-08T10:33:59ZengWileyIET Computer Vision1751-96321751-96402024-02-01181728310.1049/cvi2.12225Lite‐weight semantic segmentation with AG self‐attentionBing Liu0Yansheng Gao1Hai Li2Zhaohao Zhong3Hongwei Zhao4College of Computer Science and Technology Jilin University Changchun ChinaCollege of Computer Science and Engineering Changchun University of Technology Changchun ChinaCollege of Computer Science and Technology Jilin University Changchun ChinaCollege of Computer Science and Engineering Changchun University of Technology Changchun ChinaCollege of Computer Science and Technology Jilin University Changchun ChinaAbstract Due to the large computational and GPUs memory cost of semantic segmentation, some works focus on designing a lite weight model to achieve a good trade‐off between computational cost and accuracy. A common method is to combined CNN and vision transformer. However, these methods ignore the contextual information of multi receptive fields. And existing methods often fail to inject detailed information losses in the downsampling of multi‐scale feature. To fix these issues, we propose AG Self‐Attention, which is Enhanced Atrous Self‐Attention (EASA), and Gate Attention. AG Self‐Attention adds the contextual information of multi receptive fields into the global semantic feature. Specifically, the Enhanced Atrous Self‐Attention uses weight shared atrous convolution with different atrous rates to get the contextual information under the specific different receptive fields. Gate Attention introduces gating mechanism to inject detailed information into the global semantic feature and filter detailed information by producing “fusion” gate and “update” gate. In order to prove our insight. We conduct numerous experiments in common semantic segmentation datasets, consisting of ADE20 K, COCO‐stuff, PASCAL Context, Cityscapes, to show that our method achieves state‐of‐the‐art performance and achieve a good trade‐off between computational cost and accuracy.https://doi.org/10.1049/cvi2.12225computational complexityconvolutional neural nets
spellingShingle Bing Liu
Yansheng Gao
Hai Li
Zhaohao Zhong
Hongwei Zhao
Lite‐weight semantic segmentation with AG self‐attention
IET Computer Vision
computational complexity
convolutional neural nets
title Lite‐weight semantic segmentation with AG self‐attention
title_full Lite‐weight semantic segmentation with AG self‐attention
title_fullStr Lite‐weight semantic segmentation with AG self‐attention
title_full_unstemmed Lite‐weight semantic segmentation with AG self‐attention
title_short Lite‐weight semantic segmentation with AG self‐attention
title_sort lite weight semantic segmentation with ag self attention
topic computational complexity
convolutional neural nets
url https://doi.org/10.1049/cvi2.12225
work_keys_str_mv AT bingliu liteweightsemanticsegmentationwithagselfattention
AT yanshenggao liteweightsemanticsegmentationwithagselfattention
AT haili liteweightsemanticsegmentationwithagselfattention
AT zhaohaozhong liteweightsemanticsegmentationwithagselfattention
AT hongweizhao liteweightsemanticsegmentationwithagselfattention