TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
Camouflaged object detection (COD) seeks to find concealed objects hidden in natural surroundings. COD is challenging since it has to distinguish intrinsic similarities between foreground objects and background surroundings, unlike salient object detection. Convolutional neural network (CNN)-based a...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9955520/ |
_version_ | 1811322183555743744 |
---|---|
author | Kyeong-Beom Park Jae Yeol Lee |
author_facet | Kyeong-Beom Park Jae Yeol Lee |
author_sort | Kyeong-Beom Park |
collection | DOAJ |
description | Camouflaged object detection (COD) seeks to find concealed objects hidden in natural surroundings. COD is challenging since it has to distinguish intrinsic similarities between foreground objects and background surroundings, unlike salient object detection. Convolutional neural network (CNN)-based approaches are proposed to overcome this challenge. However, they have inherent limitations in modeling and extracting global contexts. Although Transformer-based approaches are proposed to tackle this problem, which can maintain the semantic features of input images, they have limitations in learning localized spatial features in the limited receptive field. Therefore, one of the main challenges is to conduct accurate and robust COD while maintaining global contexts without sacrificing low-level contexts. This study proposes a novel concealed object detection and segmentation method using Transformer and CNN-based advanced U-Net (TCU-Net). TCU-Net can extract globalized semantic features using the Swin Transformer-based encoder and localized spatial features using the attentive Inception decoder. In particular, multi-dilated residual (MDR) blocks connecting the encoder and decoder generate refined multi-level features to improve discriminability. Finally, the attentive Inception decoder generates the final camouflaged object mask by maintaining the localized spatial information. Instead of simple up-sampling of the feature map, the attentive Inception decoder conducts cascaded deconvolution through Inception and attention modules. A weighted hybrid loss function is used for optimizing the model, consisting of the binary cross entropy (BCE) and intersection over union (IoU) losses. We comprehensively compared the proposed TCU-Net with previous studies by analyzing different metrics based on four public datasets, such as CAMO, CHAMELEON, COD10K, and NC4K. An ablation study was also conducted to evaluate network architectures and loss functions to verify advantages of the proposed approach. Experimental analysis on public datasets proves that the proposed TCU-Net outperforms previous approaches. |
first_indexed | 2024-04-13T13:30:47Z |
format | Article |
id | doaj.art-b325c37966fa4ddebeecc67b0ffeee64 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-13T13:30:47Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-b325c37966fa4ddebeecc67b0ffeee642022-12-22T02:44:59ZengIEEEIEEE Access2169-35362022-01-011012234712236010.1109/ACCESS.2022.32234249955520TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object DetectionKyeong-Beom Park0https://orcid.org/0000-0003-4737-730XJae Yeol Lee1https://orcid.org/0000-0002-2653-0742Department of Industrial Engineering, Chonnam National University, Bukgu, Gwangju, South KoreaDepartment of Industrial Engineering, Chonnam National University, Bukgu, Gwangju, South KoreaCamouflaged object detection (COD) seeks to find concealed objects hidden in natural surroundings. COD is challenging since it has to distinguish intrinsic similarities between foreground objects and background surroundings, unlike salient object detection. Convolutional neural network (CNN)-based approaches are proposed to overcome this challenge. However, they have inherent limitations in modeling and extracting global contexts. Although Transformer-based approaches are proposed to tackle this problem, which can maintain the semantic features of input images, they have limitations in learning localized spatial features in the limited receptive field. Therefore, one of the main challenges is to conduct accurate and robust COD while maintaining global contexts without sacrificing low-level contexts. This study proposes a novel concealed object detection and segmentation method using Transformer and CNN-based advanced U-Net (TCU-Net). TCU-Net can extract globalized semantic features using the Swin Transformer-based encoder and localized spatial features using the attentive Inception decoder. In particular, multi-dilated residual (MDR) blocks connecting the encoder and decoder generate refined multi-level features to improve discriminability. Finally, the attentive Inception decoder generates the final camouflaged object mask by maintaining the localized spatial information. Instead of simple up-sampling of the feature map, the attentive Inception decoder conducts cascaded deconvolution through Inception and attention modules. A weighted hybrid loss function is used for optimizing the model, consisting of the binary cross entropy (BCE) and intersection over union (IoU) losses. We comprehensively compared the proposed TCU-Net with previous studies by analyzing different metrics based on four public datasets, such as CAMO, CHAMELEON, COD10K, and NC4K. An ablation study was also conducted to evaluate network architectures and loss functions to verify advantages of the proposed approach. Experimental analysis on public datasets proves that the proposed TCU-Net outperforms previous approaches.https://ieeexplore.ieee.org/document/9955520/Advanced U-Netcamouflaged object detectionattentive Inception decodermulti-dilated residual blockSwin Transformer |
spellingShingle | Kyeong-Beom Park Jae Yeol Lee TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection IEEE Access Advanced U-Net camouflaged object detection attentive Inception decoder multi-dilated residual block Swin Transformer |
title | TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection |
title_full | TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection |
title_fullStr | TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection |
title_full_unstemmed | TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection |
title_short | TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection |
title_sort | tcu net transformer and convolutional neural network based advanced u net for concealed object detection |
topic | Advanced U-Net camouflaged object detection attentive Inception decoder multi-dilated residual block Swin Transformer |
url | https://ieeexplore.ieee.org/document/9955520/ |
work_keys_str_mv | AT kyeongbeompark tcunettransformerandconvolutionalneuralnetworkbasedadvancedunetforconcealedobjectdetection AT jaeyeollee tcunettransformerandconvolutionalneuralnetworkbasedadvancedunetforconcealedobjectdetection |