MU-Net: Embedding MixFormer into Unet to Extract Water Bodies from Remote Sensing Images

Water bodies extraction is important in water resource utilization and flood prevention and mitigation. Remote sensing images contain rich information, but due to the complex spatial background features and noise interference, problems such as inaccurate tributary extraction and inaccurate segmentat...

Full description

Bibliographic Details
Main Authors: Yonghong Zhang, Huanyu Lu, Guangyi Ma, Huajun Zhao, Donglin Xie, Sutong Geng, Wei Tian, Kenny Thiam Choy Lim Kam Sian
Format: Article
Language:English
Published: MDPI AG 2023-07-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/14/3559
_version_ 1797587625001877504
author Yonghong Zhang
Huanyu Lu
Guangyi Ma
Huajun Zhao
Donglin Xie
Sutong Geng
Wei Tian
Kenny Thiam Choy Lim Kam Sian
author_facet Yonghong Zhang
Huanyu Lu
Guangyi Ma
Huajun Zhao
Donglin Xie
Sutong Geng
Wei Tian
Kenny Thiam Choy Lim Kam Sian
author_sort Yonghong Zhang
collection DOAJ
description Water bodies extraction is important in water resource utilization and flood prevention and mitigation. Remote sensing images contain rich information, but due to the complex spatial background features and noise interference, problems such as inaccurate tributary extraction and inaccurate segmentation occur when extracting water bodies. Recently, using a convolutional neural network (CNN) to extract water bodies is gradually becoming popular. However, the local property of CNN limits the extraction of global information, while Transformer, using a self-attention mechanism, has great potential in modeling global information. This paper proposes the MU-Net, a hybrid MixFormer architecture, as a novel method for automatically extracting water bodies. First, the MixFormer block is embedded into Unet. The combination of CNN and MixFormer is used to model the local spatial detail information and global contextual information of the image to improve the ability of the network to capture semantic features of the water body. Then, the features generated by the encoder are refined by the attention mechanism module to suppress the interference of image background noise and non-water body features, which further improves the accuracy of water body extraction. The experiments show that our method has higher segmentation accuracy and robust performance compared with the mainstream CNN- and Transformer-based semantic segmentation networks. The proposed MU-Net achieves 90.25% and 76.52% IoU on the GID and LoveDA datasets, respectively. The experimental results also validate the potential of MixFormer in water extraction studies.
first_indexed 2024-03-11T00:41:33Z
format Article
id doaj.art-f5146bec59af4b3dbfa2cd442dbfcec8
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-11T00:41:33Z
publishDate 2023-07-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-f5146bec59af4b3dbfa2cd442dbfcec82023-11-18T21:12:29ZengMDPI AGRemote Sensing2072-42922023-07-011514355910.3390/rs15143559MU-Net: Embedding MixFormer into Unet to Extract Water Bodies from Remote Sensing ImagesYonghong Zhang0Huanyu Lu1Guangyi Ma2Huajun Zhao3Donglin Xie4Sutong Geng5Wei Tian6Kenny Thiam Choy Lim Kam Sian7School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Electronics and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Atmospheric Science and Remote Sensing, Wuxi University, Wuxi 214105, ChinaWater bodies extraction is important in water resource utilization and flood prevention and mitigation. Remote sensing images contain rich information, but due to the complex spatial background features and noise interference, problems such as inaccurate tributary extraction and inaccurate segmentation occur when extracting water bodies. Recently, using a convolutional neural network (CNN) to extract water bodies is gradually becoming popular. However, the local property of CNN limits the extraction of global information, while Transformer, using a self-attention mechanism, has great potential in modeling global information. This paper proposes the MU-Net, a hybrid MixFormer architecture, as a novel method for automatically extracting water bodies. First, the MixFormer block is embedded into Unet. The combination of CNN and MixFormer is used to model the local spatial detail information and global contextual information of the image to improve the ability of the network to capture semantic features of the water body. Then, the features generated by the encoder are refined by the attention mechanism module to suppress the interference of image background noise and non-water body features, which further improves the accuracy of water body extraction. The experiments show that our method has higher segmentation accuracy and robust performance compared with the mainstream CNN- and Transformer-based semantic segmentation networks. The proposed MU-Net achieves 90.25% and 76.52% IoU on the GID and LoveDA datasets, respectively. The experimental results also validate the potential of MixFormer in water extraction studies.https://www.mdpi.com/2072-4292/15/14/3559attention mechanismconvolutional neural networkMixFormerremote sensingsemantic segmentationTransformer
spellingShingle Yonghong Zhang
Huanyu Lu
Guangyi Ma
Huajun Zhao
Donglin Xie
Sutong Geng
Wei Tian
Kenny Thiam Choy Lim Kam Sian
MU-Net: Embedding MixFormer into Unet to Extract Water Bodies from Remote Sensing Images
Remote Sensing
attention mechanism
convolutional neural network
MixFormer
remote sensing
semantic segmentation
Transformer
title MU-Net: Embedding MixFormer into Unet to Extract Water Bodies from Remote Sensing Images
title_full MU-Net: Embedding MixFormer into Unet to Extract Water Bodies from Remote Sensing Images
title_fullStr MU-Net: Embedding MixFormer into Unet to Extract Water Bodies from Remote Sensing Images
title_full_unstemmed MU-Net: Embedding MixFormer into Unet to Extract Water Bodies from Remote Sensing Images
title_short MU-Net: Embedding MixFormer into Unet to Extract Water Bodies from Remote Sensing Images
title_sort mu net embedding mixformer into unet to extract water bodies from remote sensing images
topic attention mechanism
convolutional neural network
MixFormer
remote sensing
semantic segmentation
Transformer
url https://www.mdpi.com/2072-4292/15/14/3559
work_keys_str_mv AT yonghongzhang munetembeddingmixformerintounettoextractwaterbodiesfromremotesensingimages
AT huanyulu munetembeddingmixformerintounettoextractwaterbodiesfromremotesensingimages
AT guangyima munetembeddingmixformerintounettoextractwaterbodiesfromremotesensingimages
AT huajunzhao munetembeddingmixformerintounettoextractwaterbodiesfromremotesensingimages
AT donglinxie munetembeddingmixformerintounettoextractwaterbodiesfromremotesensingimages
AT sutonggeng munetembeddingmixformerintounettoextractwaterbodiesfromremotesensingimages
AT weitian munetembeddingmixformerintounettoextractwaterbodiesfromremotesensingimages
AT kennythiamchoylimkamsian munetembeddingmixformerintounettoextractwaterbodiesfromremotesensingimages