TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection

Camouflaged object detection (COD) seeks to find concealed objects hidden in natural surroundings. COD is challenging since it has to distinguish intrinsic similarities between foreground objects and background surroundings, unlike salient object detection. Convolutional neural network (CNN)-based a...

Full description

Bibliographic Details
Main Authors: Kyeong-Beom Park, Jae Yeol Lee
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9955520/
_version_ 1811322183555743744
author Kyeong-Beom Park
Jae Yeol Lee
author_facet Kyeong-Beom Park
Jae Yeol Lee
author_sort Kyeong-Beom Park
collection DOAJ
description Camouflaged object detection (COD) seeks to find concealed objects hidden in natural surroundings. COD is challenging since it has to distinguish intrinsic similarities between foreground objects and background surroundings, unlike salient object detection. Convolutional neural network (CNN)-based approaches are proposed to overcome this challenge. However, they have inherent limitations in modeling and extracting global contexts. Although Transformer-based approaches are proposed to tackle this problem, which can maintain the semantic features of input images, they have limitations in learning localized spatial features in the limited receptive field. Therefore, one of the main challenges is to conduct accurate and robust COD while maintaining global contexts without sacrificing low-level contexts. This study proposes a novel concealed object detection and segmentation method using Transformer and CNN-based advanced U-Net (TCU-Net). TCU-Net can extract globalized semantic features using the Swin Transformer-based encoder and localized spatial features using the attentive Inception decoder. In particular, multi-dilated residual (MDR) blocks connecting the encoder and decoder generate refined multi-level features to improve discriminability. Finally, the attentive Inception decoder generates the final camouflaged object mask by maintaining the localized spatial information. Instead of simple up-sampling of the feature map, the attentive Inception decoder conducts cascaded deconvolution through Inception and attention modules. A weighted hybrid loss function is used for optimizing the model, consisting of the binary cross entropy (BCE) and intersection over union (IoU) losses. We comprehensively compared the proposed TCU-Net with previous studies by analyzing different metrics based on four public datasets, such as CAMO, CHAMELEON, COD10K, and NC4K. An ablation study was also conducted to evaluate network architectures and loss functions to verify advantages of the proposed approach. Experimental analysis on public datasets proves that the proposed TCU-Net outperforms previous approaches.
first_indexed 2024-04-13T13:30:47Z
format Article
id doaj.art-b325c37966fa4ddebeecc67b0ffeee64
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-13T13:30:47Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-b325c37966fa4ddebeecc67b0ffeee642022-12-22T02:44:59ZengIEEEIEEE Access2169-35362022-01-011012234712236010.1109/ACCESS.2022.32234249955520TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object DetectionKyeong-Beom Park0https://orcid.org/0000-0003-4737-730XJae Yeol Lee1https://orcid.org/0000-0002-2653-0742Department of Industrial Engineering, Chonnam National University, Bukgu, Gwangju, South KoreaDepartment of Industrial Engineering, Chonnam National University, Bukgu, Gwangju, South KoreaCamouflaged object detection (COD) seeks to find concealed objects hidden in natural surroundings. COD is challenging since it has to distinguish intrinsic similarities between foreground objects and background surroundings, unlike salient object detection. Convolutional neural network (CNN)-based approaches are proposed to overcome this challenge. However, they have inherent limitations in modeling and extracting global contexts. Although Transformer-based approaches are proposed to tackle this problem, which can maintain the semantic features of input images, they have limitations in learning localized spatial features in the limited receptive field. Therefore, one of the main challenges is to conduct accurate and robust COD while maintaining global contexts without sacrificing low-level contexts. This study proposes a novel concealed object detection and segmentation method using Transformer and CNN-based advanced U-Net (TCU-Net). TCU-Net can extract globalized semantic features using the Swin Transformer-based encoder and localized spatial features using the attentive Inception decoder. In particular, multi-dilated residual (MDR) blocks connecting the encoder and decoder generate refined multi-level features to improve discriminability. Finally, the attentive Inception decoder generates the final camouflaged object mask by maintaining the localized spatial information. Instead of simple up-sampling of the feature map, the attentive Inception decoder conducts cascaded deconvolution through Inception and attention modules. A weighted hybrid loss function is used for optimizing the model, consisting of the binary cross entropy (BCE) and intersection over union (IoU) losses. We comprehensively compared the proposed TCU-Net with previous studies by analyzing different metrics based on four public datasets, such as CAMO, CHAMELEON, COD10K, and NC4K. An ablation study was also conducted to evaluate network architectures and loss functions to verify advantages of the proposed approach. Experimental analysis on public datasets proves that the proposed TCU-Net outperforms previous approaches.https://ieeexplore.ieee.org/document/9955520/Advanced U-Netcamouflaged object detectionattentive Inception decodermulti-dilated residual blockSwin Transformer
spellingShingle Kyeong-Beom Park
Jae Yeol Lee
TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
IEEE Access
Advanced U-Net
camouflaged object detection
attentive Inception decoder
multi-dilated residual block
Swin Transformer
title TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
title_full TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
title_fullStr TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
title_full_unstemmed TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
title_short TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
title_sort tcu net transformer and convolutional neural network based advanced u net for concealed object detection
topic Advanced U-Net
camouflaged object detection
attentive Inception decoder
multi-dilated residual block
Swin Transformer
url https://ieeexplore.ieee.org/document/9955520/
work_keys_str_mv AT kyeongbeompark tcunettransformerandconvolutionalneuralnetworkbasedadvancedunetforconcealedobjectdetection
AT jaeyeollee tcunettransformerandconvolutionalneuralnetworkbasedadvancedunetforconcealedobjectdetection