CAT: Learning to collaborate channel and spatial attention from multi‐information fusion

Abstract Channel and spatial attention mechanisms have proven to provide an evident performance boost of deep convolution neural networks. Most existing methods focus on one or run them parallel (series), neglecting the collaboration between the two attentions. In order to better establish the featu...

Full description

Bibliographic Details
Main Authors: Zizhang Wu, Man Wang, Weiwei Sun, Yuchen Li, Tianhao Xu, Fan Wang, Keke Huang
Format: Article
Language:English
Published: Wiley 2023-04-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/cvi2.12166
_version_ 1797846308425302016
author Zizhang Wu
Man Wang
Weiwei Sun
Yuchen Li
Tianhao Xu
Fan Wang
Keke Huang
author_facet Zizhang Wu
Man Wang
Weiwei Sun
Yuchen Li
Tianhao Xu
Fan Wang
Keke Huang
author_sort Zizhang Wu
collection DOAJ
description Abstract Channel and spatial attention mechanisms have proven to provide an evident performance boost of deep convolution neural networks. Most existing methods focus on one or run them parallel (series), neglecting the collaboration between the two attentions. In order to better establish the feature interaction between the two types of attentions, a plug‐and‐play attention module is proposed, which is termed as ‘CAT’—activating the Collaboration between spatial and channel Attentions based on learned Traits. Specifically, traits are represented as trainable coefficients (i.e. colla‐factors) to adaptively combine contributions of different attention modules to fit different image hierarchies and tasks better. Moreover, the global entropy pooling is proposed apart from global average pooling and global maximum pooling (GMP) operators, which is an effective component in suppressing noise signals by measuring the information disorder of feature maps. A three‐way pooling operation is introduced into attention modules and the adaptive mechanism is applied to fuse their outcomes. Extensive experiments on MS COCO, Pascal‐VOC, Cifar‐100, and ImageNet show that our CAT outperforms the existing state‐of‐the‐art attention mechanisms in object detection, instance segmentation, and image classification. The model and code will be released soon.
first_indexed 2024-04-09T17:52:51Z
format Article
id doaj.art-b41cca5fbbbc4baca7e77c0141cf8d59
institution Directory Open Access Journal
issn 1751-9632
1751-9640
language English
last_indexed 2024-04-09T17:52:51Z
publishDate 2023-04-01
publisher Wiley
record_format Article
series IET Computer Vision
spelling doaj.art-b41cca5fbbbc4baca7e77c0141cf8d592023-04-15T11:16:51ZengWileyIET Computer Vision1751-96321751-96402023-04-0117330931810.1049/cvi2.12166CAT: Learning to collaborate channel and spatial attention from multi‐information fusionZizhang Wu0Man Wang1Weiwei Sun2Yuchen Li3Tianhao Xu4Fan Wang5Keke Huang6Zongmu Technology Shanghai ChinaZongmu Technology Shanghai ChinaZongmu Technology Shanghai ChinaZongmu Technology Shanghai ChinaZongmu Technology Shanghai ChinaZongmu Technology Shanghai ChinaCentral South University Changsha ChinaAbstract Channel and spatial attention mechanisms have proven to provide an evident performance boost of deep convolution neural networks. Most existing methods focus on one or run them parallel (series), neglecting the collaboration between the two attentions. In order to better establish the feature interaction between the two types of attentions, a plug‐and‐play attention module is proposed, which is termed as ‘CAT’—activating the Collaboration between spatial and channel Attentions based on learned Traits. Specifically, traits are represented as trainable coefficients (i.e. colla‐factors) to adaptively combine contributions of different attention modules to fit different image hierarchies and tasks better. Moreover, the global entropy pooling is proposed apart from global average pooling and global maximum pooling (GMP) operators, which is an effective component in suppressing noise signals by measuring the information disorder of feature maps. A three‐way pooling operation is introduced into attention modules and the adaptive mechanism is applied to fuse their outcomes. Extensive experiments on MS COCO, Pascal‐VOC, Cifar‐100, and ImageNet show that our CAT outperforms the existing state‐of‐the‐art attention mechanisms in object detection, instance segmentation, and image classification. The model and code will be released soon.https://doi.org/10.1049/cvi2.12166channel attentiondynamic learningentropy poolingspatial attention
spellingShingle Zizhang Wu
Man Wang
Weiwei Sun
Yuchen Li
Tianhao Xu
Fan Wang
Keke Huang
CAT: Learning to collaborate channel and spatial attention from multi‐information fusion
IET Computer Vision
channel attention
dynamic learning
entropy pooling
spatial attention
title CAT: Learning to collaborate channel and spatial attention from multi‐information fusion
title_full CAT: Learning to collaborate channel and spatial attention from multi‐information fusion
title_fullStr CAT: Learning to collaborate channel and spatial attention from multi‐information fusion
title_full_unstemmed CAT: Learning to collaborate channel and spatial attention from multi‐information fusion
title_short CAT: Learning to collaborate channel and spatial attention from multi‐information fusion
title_sort cat learning to collaborate channel and spatial attention from multi information fusion
topic channel attention
dynamic learning
entropy pooling
spatial attention
url https://doi.org/10.1049/cvi2.12166
work_keys_str_mv AT zizhangwu catlearningtocollaboratechannelandspatialattentionfrommultiinformationfusion
AT manwang catlearningtocollaboratechannelandspatialattentionfrommultiinformationfusion
AT weiweisun catlearningtocollaboratechannelandspatialattentionfrommultiinformationfusion
AT yuchenli catlearningtocollaboratechannelandspatialattentionfrommultiinformationfusion
AT tianhaoxu catlearningtocollaboratechannelandspatialattentionfrommultiinformationfusion
AT fanwang catlearningtocollaboratechannelandspatialattentionfrommultiinformationfusion
AT kekehuang catlearningtocollaboratechannelandspatialattentionfrommultiinformationfusion