Multi-scale cross-layer fusion and center position network for pedestrian detection

Pedestrian detection has made breakthroughs after the rise of convolutional neural networks. However, it faces some challenging problems, including dataset difference, small pedestrian targets and occlusions between pedestrians. To deal with these problems, we propose a novel convolutional network a...

Full description

Bibliographic Details
Main Authors: Qian Liu, Youwei Qi, Cunbao Wang
Format: Article
Language:English
Published: Elsevier 2024-01-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157823004408
_version_ 1827356713188065280
author Qian Liu
Youwei Qi
Cunbao Wang
author_facet Qian Liu
Youwei Qi
Cunbao Wang
author_sort Qian Liu
collection DOAJ
description Pedestrian detection has made breakthroughs after the rise of convolutional neural networks. However, it faces some challenging problems, including dataset difference, small pedestrian targets and occlusions between pedestrians. To deal with these problems, we propose a novel convolutional network architecture, named multi-scale cross-layer fusion and center position network (MCF-CP-NET). A new backbone unit is designed to introduce channel-wise attention into the improved aggregated residual transformations for effective feature extraction. We select suitable anchor setting for pedestrian detection datasets to tackle the problem of dataset difference. A feature pyramid sub-network with cross-layer fusion is developed for better detection of small pedestrian targets, where cross-layer connections are used to reduce the information loss and low-level marginal feature dissipation and better fuse low- and high-level features. We add a center position branch into the localization regression sub-network in MCF-CP-NET to better detect occluded pedestrians, which predicts the centrality index of the localization box to obtain the center score, and further optimizes the score of non-maximum suppression. Experiments show that the average precision and recall of MCF-CP-NET are separately improved by 1.2% and 0.7% on the person class of Pascal VOC2007 dataset and 1.6% and 0.1% on WiderPerson dataset, in comparison with the state-of-the-art.
first_indexed 2024-03-08T05:13:55Z
format Article
id doaj.art-369aa44cc1414a229e923c080b96aa63
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-03-08T05:13:55Z
publishDate 2024-01-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-369aa44cc1414a229e923c080b96aa632024-02-07T04:42:55ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782024-01-01361101886Multi-scale cross-layer fusion and center position network for pedestrian detectionQian Liu0Youwei Qi1Cunbao Wang2Corresponding author.; School of Artificial Intelligence (School of Future Technology), Nanjing University of Information Science & Technology, Nanjing 210044, ChinaSchool of Artificial Intelligence (School of Future Technology), Nanjing University of Information Science & Technology, Nanjing 210044, ChinaSchool of Artificial Intelligence (School of Future Technology), Nanjing University of Information Science & Technology, Nanjing 210044, ChinaPedestrian detection has made breakthroughs after the rise of convolutional neural networks. However, it faces some challenging problems, including dataset difference, small pedestrian targets and occlusions between pedestrians. To deal with these problems, we propose a novel convolutional network architecture, named multi-scale cross-layer fusion and center position network (MCF-CP-NET). A new backbone unit is designed to introduce channel-wise attention into the improved aggregated residual transformations for effective feature extraction. We select suitable anchor setting for pedestrian detection datasets to tackle the problem of dataset difference. A feature pyramid sub-network with cross-layer fusion is developed for better detection of small pedestrian targets, where cross-layer connections are used to reduce the information loss and low-level marginal feature dissipation and better fuse low- and high-level features. We add a center position branch into the localization regression sub-network in MCF-CP-NET to better detect occluded pedestrians, which predicts the centrality index of the localization box to obtain the center score, and further optimizes the score of non-maximum suppression. Experiments show that the average precision and recall of MCF-CP-NET are separately improved by 1.2% and 0.7% on the person class of Pascal VOC2007 dataset and 1.6% and 0.1% on WiderPerson dataset, in comparison with the state-of-the-art.http://www.sciencedirect.com/science/article/pii/S1319157823004408Object detectionPedestrian detectionFeature fusionAttention
spellingShingle Qian Liu
Youwei Qi
Cunbao Wang
Multi-scale cross-layer fusion and center position network for pedestrian detection
Journal of King Saud University: Computer and Information Sciences
Object detection
Pedestrian detection
Feature fusion
Attention
title Multi-scale cross-layer fusion and center position network for pedestrian detection
title_full Multi-scale cross-layer fusion and center position network for pedestrian detection
title_fullStr Multi-scale cross-layer fusion and center position network for pedestrian detection
title_full_unstemmed Multi-scale cross-layer fusion and center position network for pedestrian detection
title_short Multi-scale cross-layer fusion and center position network for pedestrian detection
title_sort multi scale cross layer fusion and center position network for pedestrian detection
topic Object detection
Pedestrian detection
Feature fusion
Attention
url http://www.sciencedirect.com/science/article/pii/S1319157823004408
work_keys_str_mv AT qianliu multiscalecrosslayerfusionandcenterpositionnetworkforpedestriandetection
AT youweiqi multiscalecrosslayerfusionandcenterpositionnetworkforpedestriandetection
AT cunbaowang multiscalecrosslayerfusionandcenterpositionnetworkforpedestriandetection