CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery

Timely and accurate acquisition of crop type information is significant for irrigation scheduling, yield estimation, harvesting arrangement, etc. The unmanned aerial vehicle (UAV) has emerged as an effective way to obtain high resolution remote sensing images for crop type mapping. Convolutional neu...

Full description

Bibliographic Details
Main Authors:	Jianjian Xiang, Jia Liu, Du Chen, Qi Xiong, Chongjiu Deng
Format:	Article
Language:	English
Published:	MDPI AG 2023-02-01
Series:	Remote Sensing
Subjects:	precision agriculture UAV remote sensing semantic segmentation deep learning CNN transformer
Online Access:	https://www.mdpi.com/2072-4292/15/4/1151

_version_	1827755634913705984
author	Jianjian Xiang Jia Liu Du Chen Qi Xiong Chongjiu Deng
author_facet	Jianjian Xiang Jia Liu Du Chen Qi Xiong Chongjiu Deng
author_sort	Jianjian Xiang
collection	DOAJ
description	Timely and accurate acquisition of crop type information is significant for irrigation scheduling, yield estimation, harvesting arrangement, etc. The unmanned aerial vehicle (UAV) has emerged as an effective way to obtain high resolution remote sensing images for crop type mapping. Convolutional neural network (CNN)-based methods have been widely used to predict crop types according to UAV remote sensing imagery, which has excellent local feature extraction capabilities. However, its receptive field limits the capture of global contextual information. To solve this issue, this study introduced the self-attention-based transformer that obtained long-term feature dependencies of remote sensing imagery as supplementary to local details for accurate crop-type segmentation in UAV remote sensing imagery and proposed an end-to-end CNN–transformer feature-fused network (CTFuseNet). The proposed CTFuseNet first provided a parallel structure of CNN and transformer branches in the encoder to extract both local and global semantic features from the imagery. A new feature-fusion module was designed to flexibly aggregate the multi-scale global and local features from the two branches. Finally, the FPNHead of feature pyramid network served as the decoder for the improved adaptation to the multi-scale fused features and output the crop-type segmentation results. Our comprehensive experiments indicated that the proposed CTFuseNet achieved a higher crop-type-segmentation accuracy, with a mean intersection over union of 85.33% and a pixel accuracy of 92.46% on the benchmark remote sensing dataset and outperformed the state-of-the-art networks, including U-Net, PSPNet, DeepLabV3+, DANet, OCRNet, SETR, and SegFormer. Therefore, the proposed CTFuseNet was beneficial for crop-type segmentation, revealing the advantage of fusing the features found by the CNN and the transformer. Further work is needed to promote accuracy and efficiency of this approach, as well as to assess the model transferability.
first_indexed	2024-03-11T08:11:47Z
format	Article
id	doaj.art-5a988c81e0964b5e92859f0b5ea701d0
institution	Directory Open Access Journal
issn	2072-4292
language	English
last_indexed	2024-03-11T08:11:47Z
publishDate	2023-02-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj.art-5a988c81e0964b5e92859f0b5ea701d02023-11-16T23:04:22ZengMDPI AGRemote Sensing2072-42922023-02-01154115110.3390/rs15041151CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing ImageryJianjian Xiang0Jia Liu1Du Chen2Qi Xiong3Chongjiu Deng4School of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaTimely and accurate acquisition of crop type information is significant for irrigation scheduling, yield estimation, harvesting arrangement, etc. The unmanned aerial vehicle (UAV) has emerged as an effective way to obtain high resolution remote sensing images for crop type mapping. Convolutional neural network (CNN)-based methods have been widely used to predict crop types according to UAV remote sensing imagery, which has excellent local feature extraction capabilities. However, its receptive field limits the capture of global contextual information. To solve this issue, this study introduced the self-attention-based transformer that obtained long-term feature dependencies of remote sensing imagery as supplementary to local details for accurate crop-type segmentation in UAV remote sensing imagery and proposed an end-to-end CNN–transformer feature-fused network (CTFuseNet). The proposed CTFuseNet first provided a parallel structure of CNN and transformer branches in the encoder to extract both local and global semantic features from the imagery. A new feature-fusion module was designed to flexibly aggregate the multi-scale global and local features from the two branches. Finally, the FPNHead of feature pyramid network served as the decoder for the improved adaptation to the multi-scale fused features and output the crop-type segmentation results. Our comprehensive experiments indicated that the proposed CTFuseNet achieved a higher crop-type-segmentation accuracy, with a mean intersection over union of 85.33% and a pixel accuracy of 92.46% on the benchmark remote sensing dataset and outperformed the state-of-the-art networks, including U-Net, PSPNet, DeepLabV3+, DANet, OCRNet, SETR, and SegFormer. Therefore, the proposed CTFuseNet was beneficial for crop-type segmentation, revealing the advantage of fusing the features found by the CNN and the transformer. Further work is needed to promote accuracy and efficiency of this approach, as well as to assess the model transferability.https://www.mdpi.com/2072-4292/15/4/1151precision agricultureUAV remote sensingsemantic segmentationdeep learningCNNtransformer
spellingShingle	Jianjian Xiang Jia Liu Du Chen Qi Xiong Chongjiu Deng CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery Remote Sensing precision agriculture UAV remote sensing semantic segmentation deep learning CNN transformer
title	CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
title_full	CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
title_fullStr	CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
title_full_unstemmed	CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
title_short	CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
title_sort	ctfusenet a multi scale cnn transformer feature fused network for crop type segmentation on uav remote sensing imagery
topic	precision agriculture UAV remote sensing semantic segmentation deep learning CNN transformer
url	https://www.mdpi.com/2072-4292/15/4/1151
work_keys_str_mv	AT jianjianxiang ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery AT jialiu ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery AT duchen ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery AT qixiong ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery AT chongjiudeng ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery

CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery

Similar Items