Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting
The purpose of crowd counting is to accurately predict the number, distribution and density of crowds in real scenes. However, crowd counting often suffers from some problems such as complex background, diverse target scales, and cluttered crowd distribution, which strongly affects the precision of...
Main Author: | |
---|---|
Format: | Article |
Language: | zho |
Published: |
Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
2022-11-01
|
Series: | Jisuanji kexue yu tansuo |
Subjects: | |
Online Access: | http://fcst.ceaj.org/fileup/1673-9418/PDF/2104122.pdf |
_version_ | 1797983589812404224 |
---|---|
author | YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong |
author_facet | YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong |
author_sort | YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong |
collection | DOAJ |
description | The purpose of crowd counting is to accurately predict the number, distribution and density of crowds in real scenes. However, crowd counting often suffers from some problems such as complex background, diverse target scales, and cluttered crowd distribution, which strongly affects the precision of counting. To solve these problems, a channel and spatial attention-based encoder-decoder network for crowd counting (CSANet) is proposed. It uses a multi-level encoder-decoder network to extract multi-scale semantic features, and fully integrates spatial context information to solve the problem of pedestrian scale changes and messy distribution in complex scenes. To reduce the impact of complex background on counting performance, channel and spatial attention are introduced in the process of feature fusion to improve the quality of crowd density map by increasing the feature weights of crowd regions to highlight regions of interest, and decreasing the feature weights of weakly correlated background regions to suppress background noise interference. To verify the effectiveness of the proposed algorithm, experiments are conducted on several classical crowd counting datasets, and the experimental results show that CSANet performs well in multi-scale feature extraction and background noise suppression compared with existing crowd counting algorithms, which greatly improves the accuracy and robustness of counting algorithm in dense scenes. |
first_indexed | 2024-04-11T06:49:42Z |
format | Article |
id | doaj.art-f3498a0ebfdf472592e2797fb3190d95 |
institution | Directory Open Access Journal |
issn | 1673-9418 |
language | zho |
last_indexed | 2024-04-11T06:49:42Z |
publishDate | 2022-11-01 |
publisher | Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press |
record_format | Article |
series | Jisuanji kexue yu tansuo |
spelling | doaj.art-f3498a0ebfdf472592e2797fb3190d952022-12-22T04:39:15ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182022-11-0116112547255610.3778/j.issn.1673-9418.2104122Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd CountingYU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong0College of Software, East China Jiaotong University, Nanchang 330013, ChinaThe purpose of crowd counting is to accurately predict the number, distribution and density of crowds in real scenes. However, crowd counting often suffers from some problems such as complex background, diverse target scales, and cluttered crowd distribution, which strongly affects the precision of counting. To solve these problems, a channel and spatial attention-based encoder-decoder network for crowd counting (CSANet) is proposed. It uses a multi-level encoder-decoder network to extract multi-scale semantic features, and fully integrates spatial context information to solve the problem of pedestrian scale changes and messy distribution in complex scenes. To reduce the impact of complex background on counting performance, channel and spatial attention are introduced in the process of feature fusion to improve the quality of crowd density map by increasing the feature weights of crowd regions to highlight regions of interest, and decreasing the feature weights of weakly correlated background regions to suppress background noise interference. To verify the effectiveness of the proposed algorithm, experiments are conducted on several classical crowd counting datasets, and the experimental results show that CSANet performs well in multi-scale feature extraction and background noise suppression compared with existing crowd counting algorithms, which greatly improves the accuracy and robustness of counting algorithm in dense scenes.http://fcst.ceaj.org/fileup/1673-9418/PDF/2104122.pdf|crowd counting|encoder-decoder network|attention|feature fusion|deep learning |
spellingShingle | YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting Jisuanji kexue yu tansuo |crowd counting|encoder-decoder network|attention|feature fusion|deep learning |
title | Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting |
title_full | Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting |
title_fullStr | Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting |
title_full_unstemmed | Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting |
title_short | Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting |
title_sort | encoder decoder network fusing channel and spatial attention for crowd counting |
topic | |crowd counting|encoder-decoder network|attention|feature fusion|deep learning |
url | http://fcst.ceaj.org/fileup/1673-9418/PDF/2104122.pdf |
work_keys_str_mv | AT yuyingpanchengzhuhuilinqianjintanghong encoderdecodernetworkfusingchannelandspatialattentionforcrowdcounting |