Cross-modality feature fusion for night pedestrian detection
Night pedestrian detection with visible image only suffers from the dilemma of high miss rate due to poor illumination conditions. Cross-modality fusion can ameliorate this dilemma by providing complementary information to each other through infrared and visible images. In this paper, we propose a c...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2024-03-01
|
Series: | Frontiers in Physics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fphy.2024.1356248/full |
_version_ | 1797244456320106496 |
---|---|
author | Yong Feng Enbo Luo Hai Lu SuWei Zhai |
author_facet | Yong Feng Enbo Luo Hai Lu SuWei Zhai |
author_sort | Yong Feng |
collection | DOAJ |
description | Night pedestrian detection with visible image only suffers from the dilemma of high miss rate due to poor illumination conditions. Cross-modality fusion can ameliorate this dilemma by providing complementary information to each other through infrared and visible images. In this paper, we propose a cross-modal fusion framework based on YOLOv5, which is aimed at addressing the challenges of night pedestrian detection under low-light conditions. The framework employs a dual-stream architecture that processes visible images and infrared images separately. Through the Cross-Modal Feature Rectification Module (CMFRM), visible and infrared features are finely tuned on a granular level, leveraging their spatial correlations to focus on complementary information and substantially reduce uncertainty and noise from different modalities. Additionally, we have introduced a two-stage Feature Fusion Module (FFM), with the first stage introducing a cross-attention mechanism for cross-modal global reasoning, and the second stage using a mixed channel embedding to produce enhanced feature outputs. Moreover, our method involves multi-dimensional interaction, not only correcting feature maps in terms of channel and spatial dimensions but also applying cross-attention at the sequence processing level, which is critical for the effective generalization of cross-modal feature combinations. In summary, our research significantly enhances the accuracy and robustness of nighttime pedestrian detection, offering new perspectives and technical pathways for visual information processing in low-light environments. |
first_indexed | 2024-04-24T19:11:18Z |
format | Article |
id | doaj.art-fb5e8140ccc8489e94dd13213875695f |
institution | Directory Open Access Journal |
issn | 2296-424X |
language | English |
last_indexed | 2024-04-24T19:11:18Z |
publishDate | 2024-03-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Physics |
spelling | doaj.art-fb5e8140ccc8489e94dd13213875695f2024-03-26T10:51:29ZengFrontiers Media S.A.Frontiers in Physics2296-424X2024-03-011210.3389/fphy.2024.13562481356248Cross-modality feature fusion for night pedestrian detectionYong FengEnbo LuoHai LuSuWei ZhaiNight pedestrian detection with visible image only suffers from the dilemma of high miss rate due to poor illumination conditions. Cross-modality fusion can ameliorate this dilemma by providing complementary information to each other through infrared and visible images. In this paper, we propose a cross-modal fusion framework based on YOLOv5, which is aimed at addressing the challenges of night pedestrian detection under low-light conditions. The framework employs a dual-stream architecture that processes visible images and infrared images separately. Through the Cross-Modal Feature Rectification Module (CMFRM), visible and infrared features are finely tuned on a granular level, leveraging their spatial correlations to focus on complementary information and substantially reduce uncertainty and noise from different modalities. Additionally, we have introduced a two-stage Feature Fusion Module (FFM), with the first stage introducing a cross-attention mechanism for cross-modal global reasoning, and the second stage using a mixed channel embedding to produce enhanced feature outputs. Moreover, our method involves multi-dimensional interaction, not only correcting feature maps in terms of channel and spatial dimensions but also applying cross-attention at the sequence processing level, which is critical for the effective generalization of cross-modal feature combinations. In summary, our research significantly enhances the accuracy and robustness of nighttime pedestrian detection, offering new perspectives and technical pathways for visual information processing in low-light environments.https://www.frontiersin.org/articles/10.3389/fphy.2024.1356248/fullpedestrian detectionYOLOv5vision transformerCNNsfeature fusion |
spellingShingle | Yong Feng Enbo Luo Hai Lu SuWei Zhai Cross-modality feature fusion for night pedestrian detection Frontiers in Physics pedestrian detection YOLOv5 vision transformer CNNs feature fusion |
title | Cross-modality feature fusion for night pedestrian detection |
title_full | Cross-modality feature fusion for night pedestrian detection |
title_fullStr | Cross-modality feature fusion for night pedestrian detection |
title_full_unstemmed | Cross-modality feature fusion for night pedestrian detection |
title_short | Cross-modality feature fusion for night pedestrian detection |
title_sort | cross modality feature fusion for night pedestrian detection |
topic | pedestrian detection YOLOv5 vision transformer CNNs feature fusion |
url | https://www.frontiersin.org/articles/10.3389/fphy.2024.1356248/full |
work_keys_str_mv | AT yongfeng crossmodalityfeaturefusionfornightpedestriandetection AT enboluo crossmodalityfeaturefusionfornightpedestriandetection AT hailu crossmodalityfeaturefusionfornightpedestriandetection AT suweizhai crossmodalityfeaturefusionfornightpedestriandetection |