Toward reliable fusion object detection based on dilated pyramid and semantic attention
Abstract Object detection on fused images of visible and infrared modals is of great importance for many applications, for example, surveillance and rescue at low‐light conditions. However, current detectors have difficulty for robust fused image detection for mainly two reasons. First, objects are...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2024-02-01
|
Series: | Engineering Reports |
Subjects: | |
Online Access: | https://doi.org/10.1002/eng2.12714 |
_version_ | 1797335310263123968 |
---|---|
author | Rong Chang Shan Gao Hao Li Shan Zhao Yang Yang |
author_facet | Rong Chang Shan Gao Hao Li Shan Zhao Yang Yang |
author_sort | Rong Chang |
collection | DOAJ |
description | Abstract Object detection on fused images of visible and infrared modals is of great importance for many applications, for example, surveillance and rescue at low‐light conditions. However, current detectors have difficulty for robust fused image detection for mainly two reasons. First, objects are presented in various shapes and sizes, making some hard samples cannot be localized accurately. Second, the same object category in the fused images will have different appearance due to changing weather condition, temperature and intrinsic heat. Such a contradiction will degrade the classification task of a detection network, since it cannot merge commonalities and distinguish differences well. In this paper, we propose to reconstruct the detection pipeline of current detectors, and enhance the detection ability on difficult samples in fused images. Specifically, a Dilation Pyramid Network (DPN) is designed at the lateral connection to generate and aggregate features of various receptive field, without increasing pyramid layers. To strengthen the classification, a Semantic Category Attention Module (SCAM) is proposed to capture attention centers of semantics in fused images, rather than object centers. Abundant experiments on two fusion datasets show that the proposed method achieves a satisfying performance, and both modules can greatly improve current generic detectors on fused images. |
first_indexed | 2024-03-08T08:35:07Z |
format | Article |
id | doaj.art-3a2160e058b44468958d047617efe70a |
institution | Directory Open Access Journal |
issn | 2577-8196 |
language | English |
last_indexed | 2024-03-08T08:35:07Z |
publishDate | 2024-02-01 |
publisher | Wiley |
record_format | Article |
series | Engineering Reports |
spelling | doaj.art-3a2160e058b44468958d047617efe70a2024-02-02T01:25:53ZengWileyEngineering Reports2577-81962024-02-0162n/an/a10.1002/eng2.12714Toward reliable fusion object detection based on dilated pyramid and semantic attentionRong Chang0Shan Gao1Hao Li2Shan Zhao3Yang Yang4Yuxi Power Supply Bureau Yunnan Power Grid Co., LTD of Kunming Yunnan ChinaGuangzhou Jianruan Technology Co., LTD Guangzhou ChinaSchool of Information Science and Technology Yunnan Normal University Yunnan ChinaSchool of Information Science and Technology Yunnan Normal University Yunnan ChinaSchool of Information Science and Technology Yunnan Normal University Yunnan ChinaAbstract Object detection on fused images of visible and infrared modals is of great importance for many applications, for example, surveillance and rescue at low‐light conditions. However, current detectors have difficulty for robust fused image detection for mainly two reasons. First, objects are presented in various shapes and sizes, making some hard samples cannot be localized accurately. Second, the same object category in the fused images will have different appearance due to changing weather condition, temperature and intrinsic heat. Such a contradiction will degrade the classification task of a detection network, since it cannot merge commonalities and distinguish differences well. In this paper, we propose to reconstruct the detection pipeline of current detectors, and enhance the detection ability on difficult samples in fused images. Specifically, a Dilation Pyramid Network (DPN) is designed at the lateral connection to generate and aggregate features of various receptive field, without increasing pyramid layers. To strengthen the classification, a Semantic Category Attention Module (SCAM) is proposed to capture attention centers of semantics in fused images, rather than object centers. Abundant experiments on two fusion datasets show that the proposed method achieves a satisfying performance, and both modules can greatly improve current generic detectors on fused images.https://doi.org/10.1002/eng2.12714attention mechanismfused imageobject detection |
spellingShingle | Rong Chang Shan Gao Hao Li Shan Zhao Yang Yang Toward reliable fusion object detection based on dilated pyramid and semantic attention Engineering Reports attention mechanism fused image object detection |
title | Toward reliable fusion object detection based on dilated pyramid and semantic attention |
title_full | Toward reliable fusion object detection based on dilated pyramid and semantic attention |
title_fullStr | Toward reliable fusion object detection based on dilated pyramid and semantic attention |
title_full_unstemmed | Toward reliable fusion object detection based on dilated pyramid and semantic attention |
title_short | Toward reliable fusion object detection based on dilated pyramid and semantic attention |
title_sort | toward reliable fusion object detection based on dilated pyramid and semantic attention |
topic | attention mechanism fused image object detection |
url | https://doi.org/10.1002/eng2.12714 |
work_keys_str_mv | AT rongchang towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention AT shangao towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention AT haoli towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention AT shanzhao towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention AT yangyang towardreliablefusionobjectdetectionbasedondilatedpyramidandsemanticattention |