CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
Timely and accurate acquisition of crop type information is significant for irrigation scheduling, yield estimation, harvesting arrangement, etc. The unmanned aerial vehicle (UAV) has emerged as an effective way to obtain high resolution remote sensing images for crop type mapping. Convolutional neu...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/15/4/1151 |
_version_ | 1797618350542553088 |
---|---|
author | Jianjian Xiang Jia Liu Du Chen Qi Xiong Chongjiu Deng |
author_facet | Jianjian Xiang Jia Liu Du Chen Qi Xiong Chongjiu Deng |
author_sort | Jianjian Xiang |
collection | DOAJ |
description | Timely and accurate acquisition of crop type information is significant for irrigation scheduling, yield estimation, harvesting arrangement, etc. The unmanned aerial vehicle (UAV) has emerged as an effective way to obtain high resolution remote sensing images for crop type mapping. Convolutional neural network (CNN)-based methods have been widely used to predict crop types according to UAV remote sensing imagery, which has excellent local feature extraction capabilities. However, its receptive field limits the capture of global contextual information. To solve this issue, this study introduced the self-attention-based transformer that obtained long-term feature dependencies of remote sensing imagery as supplementary to local details for accurate crop-type segmentation in UAV remote sensing imagery and proposed an end-to-end CNN–transformer feature-fused network (CTFuseNet). The proposed CTFuseNet first provided a parallel structure of CNN and transformer branches in the encoder to extract both local and global semantic features from the imagery. A new feature-fusion module was designed to flexibly aggregate the multi-scale global and local features from the two branches. Finally, the FPNHead of feature pyramid network served as the decoder for the improved adaptation to the multi-scale fused features and output the crop-type segmentation results. Our comprehensive experiments indicated that the proposed CTFuseNet achieved a higher crop-type-segmentation accuracy, with a mean intersection over union of 85.33% and a pixel accuracy of 92.46% on the benchmark remote sensing dataset and outperformed the state-of-the-art networks, including U-Net, PSPNet, DeepLabV3+, DANet, OCRNet, SETR, and SegFormer. Therefore, the proposed CTFuseNet was beneficial for crop-type segmentation, revealing the advantage of fusing the features found by the CNN and the transformer. Further work is needed to promote accuracy and efficiency of this approach, as well as to assess the model transferability. |
first_indexed | 2024-03-11T08:11:47Z |
format | Article |
id | doaj.art-5a988c81e0964b5e92859f0b5ea701d0 |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-03-11T08:11:47Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-5a988c81e0964b5e92859f0b5ea701d02023-11-16T23:04:22ZengMDPI AGRemote Sensing2072-42922023-02-01154115110.3390/rs15041151CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing ImageryJianjian Xiang0Jia Liu1Du Chen2Qi Xiong3Chongjiu Deng4School of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaTimely and accurate acquisition of crop type information is significant for irrigation scheduling, yield estimation, harvesting arrangement, etc. The unmanned aerial vehicle (UAV) has emerged as an effective way to obtain high resolution remote sensing images for crop type mapping. Convolutional neural network (CNN)-based methods have been widely used to predict crop types according to UAV remote sensing imagery, which has excellent local feature extraction capabilities. However, its receptive field limits the capture of global contextual information. To solve this issue, this study introduced the self-attention-based transformer that obtained long-term feature dependencies of remote sensing imagery as supplementary to local details for accurate crop-type segmentation in UAV remote sensing imagery and proposed an end-to-end CNN–transformer feature-fused network (CTFuseNet). The proposed CTFuseNet first provided a parallel structure of CNN and transformer branches in the encoder to extract both local and global semantic features from the imagery. A new feature-fusion module was designed to flexibly aggregate the multi-scale global and local features from the two branches. Finally, the FPNHead of feature pyramid network served as the decoder for the improved adaptation to the multi-scale fused features and output the crop-type segmentation results. Our comprehensive experiments indicated that the proposed CTFuseNet achieved a higher crop-type-segmentation accuracy, with a mean intersection over union of 85.33% and a pixel accuracy of 92.46% on the benchmark remote sensing dataset and outperformed the state-of-the-art networks, including U-Net, PSPNet, DeepLabV3+, DANet, OCRNet, SETR, and SegFormer. Therefore, the proposed CTFuseNet was beneficial for crop-type segmentation, revealing the advantage of fusing the features found by the CNN and the transformer. Further work is needed to promote accuracy and efficiency of this approach, as well as to assess the model transferability.https://www.mdpi.com/2072-4292/15/4/1151precision agricultureUAV remote sensingsemantic segmentationdeep learningCNNtransformer |
spellingShingle | Jianjian Xiang Jia Liu Du Chen Qi Xiong Chongjiu Deng CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery Remote Sensing precision agriculture UAV remote sensing semantic segmentation deep learning CNN transformer |
title | CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery |
title_full | CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery |
title_fullStr | CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery |
title_full_unstemmed | CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery |
title_short | CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery |
title_sort | ctfusenet a multi scale cnn transformer feature fused network for crop type segmentation on uav remote sensing imagery |
topic | precision agriculture UAV remote sensing semantic segmentation deep learning CNN transformer |
url | https://www.mdpi.com/2072-4292/15/4/1151 |
work_keys_str_mv | AT jianjianxiang ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery AT jialiu ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery AT duchen ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery AT qixiong ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery AT chongjiudeng ctfusenetamultiscalecnntransformerfeaturefusednetworkforcroptypesegmentationonuavremotesensingimagery |