TANet: Transformer‐based asymmetric network for RGB‐D salient object detection
Abstract Existing RGB‐D salient object detection methods mainly rely on a symmetric two‐stream Convolutional Neural Network (CNN)‐based network to extract RGB and depth channel features separately. However, there are two problems with the symmetric conventional network structure: first, the ability...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2023-06-01
|
Series: | IET Computer Vision |
Subjects: | |
Online Access: | https://doi.org/10.1049/cvi2.12177 |
_version_ | 1827934519978622976 |
---|---|
author | Chang Liu Gang Yang Shuo Wang Hangxu Wang Yunhua Zhang Yutao Wang |
author_facet | Chang Liu Gang Yang Shuo Wang Hangxu Wang Yunhua Zhang Yutao Wang |
author_sort | Chang Liu |
collection | DOAJ |
description | Abstract Existing RGB‐D salient object detection methods mainly rely on a symmetric two‐stream Convolutional Neural Network (CNN)‐based network to extract RGB and depth channel features separately. However, there are two problems with the symmetric conventional network structure: first, the ability of CNN in learning global contexts is limited; second, the symmetric two‐stream structure ignores the inherent differences between modalities. In this study, a Transformer‐based asymmetric network is proposed to tackle the issues mentioned above. The authors employ the powerful feature extraction capability of Transformer to extract global semantic information from RGB data and design a lightweight CNN backbone to extract spatial structure information from depth data without pre‐training. The asymmetric hybrid encoder effectively reduces the number of parameters in the model while increasing speed without sacrificing performance. Then, a cross‐modal feature fusion module which enhances and fuses RGB and depth features with each other is designed. Finally, the authors add edge prediction as an auxiliary task and propose an edge enhancement module to generate sharper contours. Extensive experiments demonstrate that our method achieves superior performance over 14 state‐of‐the‐art RGB‐D methods on six public datasets. The code of the authors will be released at https://github.com/lc012463/TANet. |
first_indexed | 2024-03-13T07:40:40Z |
format | Article |
id | doaj.art-6f13e4d24bb8493fbabc0c006c58ead6 |
institution | Directory Open Access Journal |
issn | 1751-9632 1751-9640 |
language | English |
last_indexed | 2024-03-13T07:40:40Z |
publishDate | 2023-06-01 |
publisher | Wiley |
record_format | Article |
series | IET Computer Vision |
spelling | doaj.art-6f13e4d24bb8493fbabc0c006c58ead62023-06-03T07:26:31ZengWileyIET Computer Vision1751-96321751-96402023-06-0117441543010.1049/cvi2.12177TANet: Transformer‐based asymmetric network for RGB‐D salient object detectionChang Liu0Gang Yang1Shuo Wang2Hangxu Wang3Yunhua Zhang4Yutao Wang5Northeastern University Shenyang Liaoning ChinaNortheastern University Shenyang Liaoning ChinaNortheastern University Shenyang Liaoning ChinaNortheastern University Shenyang Liaoning ChinaNortheastern University Shenyang Liaoning ChinaNortheastern University Shenyang Liaoning ChinaAbstract Existing RGB‐D salient object detection methods mainly rely on a symmetric two‐stream Convolutional Neural Network (CNN)‐based network to extract RGB and depth channel features separately. However, there are two problems with the symmetric conventional network structure: first, the ability of CNN in learning global contexts is limited; second, the symmetric two‐stream structure ignores the inherent differences between modalities. In this study, a Transformer‐based asymmetric network is proposed to tackle the issues mentioned above. The authors employ the powerful feature extraction capability of Transformer to extract global semantic information from RGB data and design a lightweight CNN backbone to extract spatial structure information from depth data without pre‐training. The asymmetric hybrid encoder effectively reduces the number of parameters in the model while increasing speed without sacrificing performance. Then, a cross‐modal feature fusion module which enhances and fuses RGB and depth features with each other is designed. Finally, the authors add edge prediction as an auxiliary task and propose an edge enhancement module to generate sharper contours. Extensive experiments demonstrate that our method achieves superior performance over 14 state‐of‐the‐art RGB‐D methods on six public datasets. The code of the authors will be released at https://github.com/lc012463/TANet.https://doi.org/10.1049/cvi2.12177computer visionimage segmentationobject detection |
spellingShingle | Chang Liu Gang Yang Shuo Wang Hangxu Wang Yunhua Zhang Yutao Wang TANet: Transformer‐based asymmetric network for RGB‐D salient object detection IET Computer Vision computer vision image segmentation object detection |
title | TANet: Transformer‐based asymmetric network for RGB‐D salient object detection |
title_full | TANet: Transformer‐based asymmetric network for RGB‐D salient object detection |
title_fullStr | TANet: Transformer‐based asymmetric network for RGB‐D salient object detection |
title_full_unstemmed | TANet: Transformer‐based asymmetric network for RGB‐D salient object detection |
title_short | TANet: Transformer‐based asymmetric network for RGB‐D salient object detection |
title_sort | tanet transformer based asymmetric network for rgb d salient object detection |
topic | computer vision image segmentation object detection |
url | https://doi.org/10.1049/cvi2.12177 |
work_keys_str_mv | AT changliu tanettransformerbasedasymmetricnetworkforrgbdsalientobjectdetection AT gangyang tanettransformerbasedasymmetricnetworkforrgbdsalientobjectdetection AT shuowang tanettransformerbasedasymmetricnetworkforrgbdsalientobjectdetection AT hangxuwang tanettransformerbasedasymmetricnetworkforrgbdsalientobjectdetection AT yunhuazhang tanettransformerbasedasymmetricnetworkforrgbdsalientobjectdetection AT yutaowang tanettransformerbasedasymmetricnetworkforrgbdsalientobjectdetection |