CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery

Timely and accurate acquisition of crop type information is significant for irrigation scheduling, yield estimation, harvesting arrangement, etc. The unmanned aerial vehicle (UAV) has emerged as an effective way to obtain high resolution remote sensing images for crop type mapping. Convolutional neu...

Full description

Bibliographic Details
Main Authors: Jianjian Xiang, Jia Liu, Du Chen, Qi Xiong, Chongjiu Deng
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/4/1151
_version_ 1797618350542553088
author Jianjian Xiang
Jia Liu
Du Chen
Qi Xiong
Chongjiu Deng
author_facet Jianjian Xiang
Jia Liu
Du Chen
Qi Xiong
Chongjiu Deng
author_sort Jianjian Xiang
collection DOAJ
description Timely and accurate acquisition of crop type information is significant for irrigation scheduling, yield estimation, harvesting arrangement, etc. The unmanned aerial vehicle (UAV) has emerged as an effective way to obtain high resolution remote sensing images for crop type mapping. Convolutional neural network (CNN)-based methods have been widely used to predict crop types according to UAV remote sensing imagery, which has excellent local feature extraction capabilities. However, its receptive field limits the capture of global contextual information. To solve this issue, this study introduced the self-attention-based transformer that obtained long-term feature dependencies of remote sensing imagery as supplementary to local details for accurate crop-type segmentation in UAV remote sensing imagery and proposed an end-to-end CNN–transformer feature-fused network (CTFuseNet). The proposed CTFuseNet first provided a parallel structure of CNN and transformer branches in the encoder to extract both local and global semantic features from the imagery. A new feature-fusion module was designed to flexibly aggregate the multi-scale global and local features from the two branches. Finally, the FPNHead of feature pyramid network served as the decoder for the improved adaptation to the multi-scale fused features and output the crop-type segmentation results. Our comprehensive experiments indicated that the proposed CTFuseNet achieved a higher crop-type-segmentation accuracy, with a mean intersection over union of 85.33% and a pixel accuracy of 92.46% on the benchmark remote sensing dataset and outperformed the state-of-the-art networks, including U-Net, PSPNet, DeepLabV3+, DANet, OCRNet, SETR, and SegFormer. Therefore, the proposed CTFuseNet was beneficial for crop-type segmentation, revealing the advantage of fusing the features found by the CNN and the transformer. Further work is needed to promote accuracy and efficiency of this approach, as well as to assess the model transferability.
first_indexed 2024-03-11T08:11:47Z
format Article
id doaj.art-5a988c81e0964b5e92859f0b5ea701d0
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-11T08:11:47Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-5a988c81e0964b5e92859f0b5ea701d02023-11-16T23:04:22ZengMDPI AGRemote Sensing2072-42922023-02-01154115110.3390/rs15041151CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing ImageryJianjian Xiang0Jia Liu1Du Chen2Qi Xiong3Chongjiu Deng4School of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaTimely and accurate acquisition of crop type information is significant for irrigation scheduling, yield estimation, harvesting arrangement, etc. The unmanned aerial vehicle (UAV) has emerged as an effective way to obtain high resolution remote sensing images for crop type mapping. Convolutional neural network (CNN)-based methods have been widely used to predict crop types according to UAV remote sensing imagery, which has excellent local feature extraction capabilities. However, its receptive field limits the capture of global contextual information. To solve this issue, this study introduced the self-attention-based transformer that obtained long-term feature dependencies of remote sensing imagery as supplementary to local details for accurate crop-type segmentation in UAV remote sensing imagery and proposed an end-to-end CNN–transformer feature-fused network (CTFuseNet). The proposed CTFuseNet first provided a parallel structure of CNN and transformer branches in the encoder to extract both local and global semantic features from the imagery. A new feature-fusion module was designed to flexibly aggregate the multi-scale global and local features from the two branches. Finally, the FPNHead of feature pyramid network served as the decoder for the improved adaptation to the multi-scale fused features and output the crop-type segmentation results. Our comprehensive experiments indicated that the proposed CTFuseNet achieved a higher crop-type-segmentation accuracy, with a mean intersection over union of 85.33% and a pixel accuracy of 92.46% on the benchmark remote sensing dataset and outperformed the state-of-the-art networks, including U-Net, PSPNet, DeepLabV3+, DANet, OCRNet, SETR, and SegFormer. Therefore, the proposed CTFuseNet was beneficial for crop-type segmentation, revealing the advantage of fusing the features found by the CNN and the transformer. Further work is needed to promote accuracy and efficiency of this approach, as well as to assess the model transferability.https://www.mdpi.com/2072-4292/15/4/1151precision agricultureUAV remote sensingsemantic segmentationdeep learningCNNtransformer
spellingShingle Jianjian Xiang
Jia Liu
Du Chen
Qi Xiong
Chongjiu Deng
CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
Remote Sensing
precision agriculture
UAV remote sensing
semantic segmentation
deep learning
CNN
transformer
title CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
title_full CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
title_fullStr CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
title_full_unstemmed CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
title_short CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
title_sort ctfusenet a multi scale cnn transformer feature fused network for crop type segmentation on uav remote sensing imagery
topic precision agriculture
UAV remote sensing
semantic segmentation
deep learning
CNN
transformer
url https://www.mdpi.com/2072-4292/15/4/1151
work_keys_str_mv AT jianjianxiang ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery
AT jialiu ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery
AT duchen ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery
AT qixiong ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery
AT chongjiudeng ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery