TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection

Camouflaged object detection (COD) seeks to find concealed objects hidden in natural surroundings. COD is challenging since it has to distinguish intrinsic similarities between foreground objects and background surroundings, unlike salient object detection. Convolutional neural network (CNN)-based a...

Full description

Bibliographic Details
Main Authors:	Kyeong-Beom Park, Jae Yeol Lee
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Advanced U-Net camouflaged object detection attentive Inception decoder multi-dilated residual block Swin Transformer
Online Access:	https://ieeexplore.ieee.org/document/9955520/

_version_	1811322183555743744
author	Kyeong-Beom Park Jae Yeol Lee
author_facet	Kyeong-Beom Park Jae Yeol Lee
author_sort	Kyeong-Beom Park
collection	DOAJ
description	Camouflaged object detection (COD) seeks to find concealed objects hidden in natural surroundings. COD is challenging since it has to distinguish intrinsic similarities between foreground objects and background surroundings, unlike salient object detection. Convolutional neural network (CNN)-based approaches are proposed to overcome this challenge. However, they have inherent limitations in modeling and extracting global contexts. Although Transformer-based approaches are proposed to tackle this problem, which can maintain the semantic features of input images, they have limitations in learning localized spatial features in the limited receptive field. Therefore, one of the main challenges is to conduct accurate and robust COD while maintaining global contexts without sacrificing low-level contexts. This study proposes a novel concealed object detection and segmentation method using Transformer and CNN-based advanced U-Net (TCU-Net). TCU-Net can extract globalized semantic features using the Swin Transformer-based encoder and localized spatial features using the attentive Inception decoder. In particular, multi-dilated residual (MDR) blocks connecting the encoder and decoder generate refined multi-level features to improve discriminability. Finally, the attentive Inception decoder generates the final camouflaged object mask by maintaining the localized spatial information. Instead of simple up-sampling of the feature map, the attentive Inception decoder conducts cascaded deconvolution through Inception and attention modules. A weighted hybrid loss function is used for optimizing the model, consisting of the binary cross entropy (BCE) and intersection over union (IoU) losses. We comprehensively compared the proposed TCU-Net with previous studies by analyzing different metrics based on four public datasets, such as CAMO, CHAMELEON, COD10K, and NC4K. An ablation study was also conducted to evaluate network architectures and loss functions to verify advantages of the proposed approach. Experimental analysis on public datasets proves that the proposed TCU-Net outperforms previous approaches.
first_indexed	2024-04-13T13:30:47Z
format	Article
id	doaj.art-b325c37966fa4ddebeecc67b0ffeee64
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-13T13:30:47Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-b325c37966fa4ddebeecc67b0ffeee642022-12-22T02:44:59ZengIEEEIEEE Access2169-35362022-01-011012234712236010.1109/ACCESS.2022.32234249955520TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object DetectionKyeong-Beom Park0https://orcid.org/0000-0003-4737-730XJae Yeol Lee1https://orcid.org/0000-0002-2653-0742Department of Industrial Engineering, Chonnam National University, Bukgu, Gwangju, South KoreaDepartment of Industrial Engineering, Chonnam National University, Bukgu, Gwangju, South KoreaCamouflaged object detection (COD) seeks to find concealed objects hidden in natural surroundings. COD is challenging since it has to distinguish intrinsic similarities between foreground objects and background surroundings, unlike salient object detection. Convolutional neural network (CNN)-based approaches are proposed to overcome this challenge. However, they have inherent limitations in modeling and extracting global contexts. Although Transformer-based approaches are proposed to tackle this problem, which can maintain the semantic features of input images, they have limitations in learning localized spatial features in the limited receptive field. Therefore, one of the main challenges is to conduct accurate and robust COD while maintaining global contexts without sacrificing low-level contexts. This study proposes a novel concealed object detection and segmentation method using Transformer and CNN-based advanced U-Net (TCU-Net). TCU-Net can extract globalized semantic features using the Swin Transformer-based encoder and localized spatial features using the attentive Inception decoder. In particular, multi-dilated residual (MDR) blocks connecting the encoder and decoder generate refined multi-level features to improve discriminability. Finally, the attentive Inception decoder generates the final camouflaged object mask by maintaining the localized spatial information. Instead of simple up-sampling of the feature map, the attentive Inception decoder conducts cascaded deconvolution through Inception and attention modules. A weighted hybrid loss function is used for optimizing the model, consisting of the binary cross entropy (BCE) and intersection over union (IoU) losses. We comprehensively compared the proposed TCU-Net with previous studies by analyzing different metrics based on four public datasets, such as CAMO, CHAMELEON, COD10K, and NC4K. An ablation study was also conducted to evaluate network architectures and loss functions to verify advantages of the proposed approach. Experimental analysis on public datasets proves that the proposed TCU-Net outperforms previous approaches.https://ieeexplore.ieee.org/document/9955520/Advanced U-Netcamouflaged object detectionattentive Inception decodermulti-dilated residual blockSwin Transformer
spellingShingle	Kyeong-Beom Park Jae Yeol Lee TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection IEEE Access Advanced U-Net camouflaged object detection attentive Inception decoder multi-dilated residual block Swin Transformer
title	TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
title_full	TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
title_fullStr	TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
title_full_unstemmed	TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
title_short	TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection
title_sort	tcu net transformer and convolutional neural network based advanced u net for concealed object detection
topic	Advanced U-Net camouflaged object detection attentive Inception decoder multi-dilated residual block Swin Transformer
url	https://ieeexplore.ieee.org/document/9955520/
work_keys_str_mv	AT kyeongbeompark tcunettransformerandconvolutionalneuralnetworkbasedadvancedunetforconcealedobjectdetection AT jaeyeollee tcunettransformerandconvolutionalneuralnetworkbasedadvancedunetforconcealedobjectdetection

TCU-Net: Transformer and Convolutional Neural Network-Based Advanced U-Net for Concealed Object Detection

Similar Items