Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting

The purpose of crowd counting is to accurately predict the number, distribution and density of crowds in real scenes. However, crowd counting often suffers from some problems such as complex background, diverse target scales, and cluttered crowd distribution, which strongly affects the precision of...

Full description

Bibliographic Details
Main Author: YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong
Format: Article
Language:zho
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2022-11-01
Series:Jisuanji kexue yu tansuo
Subjects:
Online Access:http://fcst.ceaj.org/fileup/1673-9418/PDF/2104122.pdf
_version_ 1797983589812404224
author YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong
author_facet YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong
author_sort YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong
collection DOAJ
description The purpose of crowd counting is to accurately predict the number, distribution and density of crowds in real scenes. However, crowd counting often suffers from some problems such as complex background, diverse target scales, and cluttered crowd distribution, which strongly affects the precision of counting. To solve these problems, a channel and spatial attention-based encoder-decoder network for crowd counting (CSANet) is proposed. It uses a multi-level encoder-decoder network to extract multi-scale semantic features, and fully integrates spatial context information to solve the problem of pedestrian scale changes and messy distribution in complex scenes. To reduce the impact of complex background on counting performance, channel and spatial attention are introduced in the process of feature fusion to improve the quality of crowd density map by increasing the feature weights of crowd regions to highlight regions of interest, and decreasing the feature weights of weakly correlated background regions to suppress background noise interference. To verify the effectiveness of the proposed algorithm, experiments are conducted on several classical crowd counting datasets, and the experimental results show that CSANet performs well in multi-scale feature extraction and background noise suppression compared with existing crowd counting algorithms, which greatly improves the accuracy and robustness of counting algorithm in dense scenes.
first_indexed 2024-04-11T06:49:42Z
format Article
id doaj.art-f3498a0ebfdf472592e2797fb3190d95
institution Directory Open Access Journal
issn 1673-9418
language zho
last_indexed 2024-04-11T06:49:42Z
publishDate 2022-11-01
publisher Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
record_format Article
series Jisuanji kexue yu tansuo
spelling doaj.art-f3498a0ebfdf472592e2797fb3190d952022-12-22T04:39:15ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182022-11-0116112547255610.3778/j.issn.1673-9418.2104122Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd CountingYU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong0College of Software, East China Jiaotong University, Nanchang 330013, ChinaThe purpose of crowd counting is to accurately predict the number, distribution and density of crowds in real scenes. However, crowd counting often suffers from some problems such as complex background, diverse target scales, and cluttered crowd distribution, which strongly affects the precision of counting. To solve these problems, a channel and spatial attention-based encoder-decoder network for crowd counting (CSANet) is proposed. It uses a multi-level encoder-decoder network to extract multi-scale semantic features, and fully integrates spatial context information to solve the problem of pedestrian scale changes and messy distribution in complex scenes. To reduce the impact of complex background on counting performance, channel and spatial attention are introduced in the process of feature fusion to improve the quality of crowd density map by increasing the feature weights of crowd regions to highlight regions of interest, and decreasing the feature weights of weakly correlated background regions to suppress background noise interference. To verify the effectiveness of the proposed algorithm, experiments are conducted on several classical crowd counting datasets, and the experimental results show that CSANet performs well in multi-scale feature extraction and background noise suppression compared with existing crowd counting algorithms, which greatly improves the accuracy and robustness of counting algorithm in dense scenes.http://fcst.ceaj.org/fileup/1673-9418/PDF/2104122.pdf|crowd counting|encoder-decoder network|attention|feature fusion|deep learning
spellingShingle YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong
Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
Jisuanji kexue yu tansuo
|crowd counting|encoder-decoder network|attention|feature fusion|deep learning
title Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
title_full Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
title_fullStr Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
title_full_unstemmed Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
title_short Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
title_sort encoder decoder network fusing channel and spatial attention for crowd counting
topic |crowd counting|encoder-decoder network|attention|feature fusion|deep learning
url http://fcst.ceaj.org/fileup/1673-9418/PDF/2104122.pdf
work_keys_str_mv AT yuyingpanchengzhuhuilinqianjintanghong encoderdecodernetworkfusingchannelandspatialattentionforcrowdcounting