Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting

The purpose of crowd counting is to accurately predict the number, distribution and density of crowds in real scenes. However, crowd counting often suffers from some problems such as complex background, diverse target scales, and cluttered crowd distribution, which strongly affects the precision of...

Full description

Bibliographic Details
Main Author:	YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong
Format:	Article
Language:	zho
Published:	Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2022-11-01
Series:	Jisuanji kexue yu tansuo
Subjects:	\|crowd counting\|encoder-decoder network\|attention\|feature fusion\|deep learning
Online Access:	http://fcst.ceaj.org/fileup/1673-9418/PDF/2104122.pdf

_version_	1797983589812404224
author	YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong
author_facet	YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong
author_sort	YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong
collection	DOAJ
description	The purpose of crowd counting is to accurately predict the number, distribution and density of crowds in real scenes. However, crowd counting often suffers from some problems such as complex background, diverse target scales, and cluttered crowd distribution, which strongly affects the precision of counting. To solve these problems, a channel and spatial attention-based encoder-decoder network for crowd counting (CSANet) is proposed. It uses a multi-level encoder-decoder network to extract multi-scale semantic features, and fully integrates spatial context information to solve the problem of pedestrian scale changes and messy distribution in complex scenes. To reduce the impact of complex background on counting performance, channel and spatial attention are introduced in the process of feature fusion to improve the quality of crowd density map by increasing the feature weights of crowd regions to highlight regions of interest, and decreasing the feature weights of weakly correlated background regions to suppress background noise interference. To verify the effectiveness of the proposed algorithm, experiments are conducted on several classical crowd counting datasets, and the experimental results show that CSANet performs well in multi-scale feature extraction and background noise suppression compared with existing crowd counting algorithms, which greatly improves the accuracy and robustness of counting algorithm in dense scenes.
first_indexed	2024-04-11T06:49:42Z
format	Article
id	doaj.art-f3498a0ebfdf472592e2797fb3190d95
institution	Directory Open Access Journal
issn	1673-9418
language	zho
last_indexed	2024-04-11T06:49:42Z
publishDate	2022-11-01
publisher	Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
record_format	Article
series	Jisuanji kexue yu tansuo
spelling	doaj.art-f3498a0ebfdf472592e2797fb3190d952022-12-22T04:39:15ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182022-11-0116112547255610.3778/j.issn.1673-9418.2104122Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd CountingYU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong0College of Software, East China Jiaotong University, Nanchang 330013, ChinaThe purpose of crowd counting is to accurately predict the number, distribution and density of crowds in real scenes. However, crowd counting often suffers from some problems such as complex background, diverse target scales, and cluttered crowd distribution, which strongly affects the precision of counting. To solve these problems, a channel and spatial attention-based encoder-decoder network for crowd counting (CSANet) is proposed. It uses a multi-level encoder-decoder network to extract multi-scale semantic features, and fully integrates spatial context information to solve the problem of pedestrian scale changes and messy distribution in complex scenes. To reduce the impact of complex background on counting performance, channel and spatial attention are introduced in the process of feature fusion to improve the quality of crowd density map by increasing the feature weights of crowd regions to highlight regions of interest, and decreasing the feature weights of weakly correlated background regions to suppress background noise interference. To verify the effectiveness of the proposed algorithm, experiments are conducted on several classical crowd counting datasets, and the experimental results show that CSANet performs well in multi-scale feature extraction and background noise suppression compared with existing crowd counting algorithms, which greatly improves the accuracy and robustness of counting algorithm in dense scenes.http://fcst.ceaj.org/fileup/1673-9418/PDF/2104122.pdf\|crowd counting\|encoder-decoder network\|attention\|feature fusion\|deep learning
spellingShingle	YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting Jisuanji kexue yu tansuo \|crowd counting\|encoder-decoder network\|attention\|feature fusion\|deep learning
title	Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
title_full	Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
title_fullStr	Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
title_full_unstemmed	Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
title_short	Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
title_sort	encoder decoder network fusing channel and spatial attention for crowd counting
topic	\|crowd counting\|encoder-decoder network\|attention\|feature fusion\|deep learning
url	http://fcst.ceaj.org/fileup/1673-9418/PDF/2104122.pdf
work_keys_str_mv	AT yuyingpanchengzhuhuilinqianjintanghong encoderdecodernetworkfusingchannelandspatialattentionforcrowdcounting

Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting

Similar Items