Mirror complementary transformer network for RGB‐thermal salient object detection

Abstract Conventional RGB‐T salient object detection treats RGB and thermal modalities equally to locate the common salient regions. However, the authors observed that the rich colour and texture information of the RGB modality makes the objects more prominent compared to the background; and the the...

Full description

Bibliographic Details
Main Authors:	Xiurong Jiang, Yifan Hou, Hui Tian, Lin Zhu
Format:	Article
Language:	English
Published:	Wiley 2024-02-01
Series:	IET Computer Vision
Subjects:	image segmentation object detection
Online Access:	https://doi.org/10.1049/cvi2.12221

_version_	1797320640286425088
author	Xiurong Jiang Yifan Hou Hui Tian Lin Zhu
author_facet	Xiurong Jiang Yifan Hou Hui Tian Lin Zhu
author_sort	Xiurong Jiang
collection	DOAJ
description	Abstract Conventional RGB‐T salient object detection treats RGB and thermal modalities equally to locate the common salient regions. However, the authors observed that the rich colour and texture information of the RGB modality makes the objects more prominent compared to the background; and the thermal modality records the temperature difference of the scene, so the objects usually contain clear and continuous edge information. In this work, a novel mirror‐complementary Transformer network (MCNet) is proposed for RGB‐T SOD, which supervise the two modalities separately with a complementary set of saliency labels under a symmetrical structure. Moreover, the attention‐based feature interaction and serial multiscale dilated convolution (SDC)‐based feature fusion modules are introduced to make the two modalities complement and adjust each other flexibly. When one modality fails, the proposed model can still accurately segment the salient regions. To demonstrate the robustness of the proposed model under challenging scenes in real world, the authors build a novel RGB‐T SOD dataset VT723 based on a large public semantic segmentation RGB‐T dataset used in the autonomous driving domain. Extensive experiments on benchmark and VT723 datasets show that the proposed method outperforms state‐of‐the‐art approaches, including CNN‐based and Transformer‐based methods. The code and dataset can be found at https://github.com/jxr326/SwinMCNet.
first_indexed	2024-03-08T04:45:56Z
format	Article
id	doaj.art-8a0803edb223406d882f340da0de0619
institution	Directory Open Access Journal
issn	1751-9632 1751-9640
language	English
last_indexed	2024-03-08T04:45:56Z
publishDate	2024-02-01
publisher	Wiley
record_format	Article
series	IET Computer Vision
spelling	doaj.art-8a0803edb223406d882f340da0de06192024-02-08T10:33:59ZengWileyIET Computer Vision1751-96321751-96402024-02-01181153210.1049/cvi2.12221Mirror complementary transformer network for RGB‐thermal salient object detectionXiurong Jiang0Yifan Hou1Hui Tian2Lin Zhu3State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing ChinaState Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing ChinaState Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing ChinaSchool of Computer Science Beijing Institute of Technology Beijing ChinaAbstract Conventional RGB‐T salient object detection treats RGB and thermal modalities equally to locate the common salient regions. However, the authors observed that the rich colour and texture information of the RGB modality makes the objects more prominent compared to the background; and the thermal modality records the temperature difference of the scene, so the objects usually contain clear and continuous edge information. In this work, a novel mirror‐complementary Transformer network (MCNet) is proposed for RGB‐T SOD, which supervise the two modalities separately with a complementary set of saliency labels under a symmetrical structure. Moreover, the attention‐based feature interaction and serial multiscale dilated convolution (SDC)‐based feature fusion modules are introduced to make the two modalities complement and adjust each other flexibly. When one modality fails, the proposed model can still accurately segment the salient regions. To demonstrate the robustness of the proposed model under challenging scenes in real world, the authors build a novel RGB‐T SOD dataset VT723 based on a large public semantic segmentation RGB‐T dataset used in the autonomous driving domain. Extensive experiments on benchmark and VT723 datasets show that the proposed method outperforms state‐of‐the‐art approaches, including CNN‐based and Transformer‐based methods. The code and dataset can be found at https://github.com/jxr326/SwinMCNet.https://doi.org/10.1049/cvi2.12221image segmentationobject detection
spellingShingle	Xiurong Jiang Yifan Hou Hui Tian Lin Zhu Mirror complementary transformer network for RGB‐thermal salient object detection IET Computer Vision image segmentation object detection
title	Mirror complementary transformer network for RGB‐thermal salient object detection
title_full	Mirror complementary transformer network for RGB‐thermal salient object detection
title_fullStr	Mirror complementary transformer network for RGB‐thermal salient object detection
title_full_unstemmed	Mirror complementary transformer network for RGB‐thermal salient object detection
title_short	Mirror complementary transformer network for RGB‐thermal salient object detection
title_sort	mirror complementary transformer network for rgb thermal salient object detection
topic	image segmentation object detection
url	https://doi.org/10.1049/cvi2.12221
work_keys_str_mv	AT xiurongjiang mirrorcomplementarytransformernetworkforrgbthermalsalientobjectdetection AT yifanhou mirrorcomplementarytransformernetworkforrgbthermalsalientobjectdetection AT huitian mirrorcomplementarytransformernetworkforrgbthermalsalientobjectdetection AT linzhu mirrorcomplementarytransformernetworkforrgbthermalsalientobjectdetection

Mirror complementary transformer network for RGB‐thermal salient object detection

Similar Items