An Efficient and Light Transformer-Based Segmentation Network for Remote Sensing Images of Landscapes

High-resolution image segmentation for landscape applications has garnered significant attention, particularly in the context of ultra-high-resolution (UHR) imagery. Current segmentation methodologies partition UHR images into standard patches for multiscale local segmentation and hierarchical reaso...

Full description

Bibliographic Details
Main Authors: Lijia Chen, Honghui Chen, Yanqiu Xie, Tianyou He, Jing Ye, Yushan Zheng
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Forests
Subjects:
Online Access:https://www.mdpi.com/1999-4907/14/11/2271
_version_ 1797459277709836288
author Lijia Chen
Honghui Chen
Yanqiu Xie
Tianyou He
Jing Ye
Yushan Zheng
author_facet Lijia Chen
Honghui Chen
Yanqiu Xie
Tianyou He
Jing Ye
Yushan Zheng
author_sort Lijia Chen
collection DOAJ
description High-resolution image segmentation for landscape applications has garnered significant attention, particularly in the context of ultra-high-resolution (UHR) imagery. Current segmentation methodologies partition UHR images into standard patches for multiscale local segmentation and hierarchical reasoning. This creates a pressing dilemma, where the trade-off between memory efficiency and segmentation quality becomes increasingly evident. This paper introduces the Multilevel Contexts Weighted Coupling Transformer (WCTNet) for UHR segmentation. This framework comprises the Mult-level Feature Weighting (MFW) module and Token-based Transformer (TT) designed to weigh and couple multilevel semantic contexts. First, we analyze the multilevel semantics within a local patch without image-level contextual reasoning. It avoids complex image-level contextual associations and eliminates the misleading information carried. Second, MFW is developed to weigh shallow and deep features for enhancing object-related attention at different grain sizes from multilevel semantics. Third, the TT module is introduced to couple multilevel semantic contexts and transform them into semantic tokens using spatial attention. Then, we can capture token interactions and obtain clearer local representations. The suggested contextual weighting and coupling of single-scale patches empower WCTNet to maintain a well-balanced relationship between accuracy and computational overhead. Experimental results show that WCTNet achieves state-of-the-art performance on two UHR datasets of DeepGlobe and Inria Aerial.
first_indexed 2024-03-09T16:49:09Z
format Article
id doaj.art-a45965c99e814d129d2fe1cca1a6b174
institution Directory Open Access Journal
issn 1999-4907
language English
last_indexed 2024-03-09T16:49:09Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Forests
spelling doaj.art-a45965c99e814d129d2fe1cca1a6b1742023-11-24T14:42:54ZengMDPI AGForests1999-49072023-11-011411227110.3390/f14112271An Efficient and Light Transformer-Based Segmentation Network for Remote Sensing Images of LandscapesLijia Chen0Honghui Chen1Yanqiu Xie2Tianyou He3Jing Ye4Yushan Zheng5College of Landscape Architecture, Fujian Agriculture and Forest University, Fuzhou 350002, ChinaDepartment of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, ChinaCollege of Landscape Architecture, Fujian Agriculture and Forest University, Fuzhou 350002, ChinaCollege of Landscape Architecture, Fujian Agriculture and Forest University, Fuzhou 350002, ChinaCollege of Landscape Architecture, Fujian Agriculture and Forest University, Fuzhou 350002, ChinaCollege of Landscape Architecture, Fujian Agriculture and Forest University, Fuzhou 350002, ChinaHigh-resolution image segmentation for landscape applications has garnered significant attention, particularly in the context of ultra-high-resolution (UHR) imagery. Current segmentation methodologies partition UHR images into standard patches for multiscale local segmentation and hierarchical reasoning. This creates a pressing dilemma, where the trade-off between memory efficiency and segmentation quality becomes increasingly evident. This paper introduces the Multilevel Contexts Weighted Coupling Transformer (WCTNet) for UHR segmentation. This framework comprises the Mult-level Feature Weighting (MFW) module and Token-based Transformer (TT) designed to weigh and couple multilevel semantic contexts. First, we analyze the multilevel semantics within a local patch without image-level contextual reasoning. It avoids complex image-level contextual associations and eliminates the misleading information carried. Second, MFW is developed to weigh shallow and deep features for enhancing object-related attention at different grain sizes from multilevel semantics. Third, the TT module is introduced to couple multilevel semantic contexts and transform them into semantic tokens using spatial attention. Then, we can capture token interactions and obtain clearer local representations. The suggested contextual weighting and coupling of single-scale patches empower WCTNet to maintain a well-balanced relationship between accuracy and computational overhead. Experimental results show that WCTNet achieves state-of-the-art performance on two UHR datasets of DeepGlobe and Inria Aerial.https://www.mdpi.com/1999-4907/14/11/2271ultra-high-resolution imagesegmentation qualitymultilevel semantic contextstransformer
spellingShingle Lijia Chen
Honghui Chen
Yanqiu Xie
Tianyou He
Jing Ye
Yushan Zheng
An Efficient and Light Transformer-Based Segmentation Network for Remote Sensing Images of Landscapes
Forests
ultra-high-resolution image
segmentation quality
multilevel semantic contexts
transformer
title An Efficient and Light Transformer-Based Segmentation Network for Remote Sensing Images of Landscapes
title_full An Efficient and Light Transformer-Based Segmentation Network for Remote Sensing Images of Landscapes
title_fullStr An Efficient and Light Transformer-Based Segmentation Network for Remote Sensing Images of Landscapes
title_full_unstemmed An Efficient and Light Transformer-Based Segmentation Network for Remote Sensing Images of Landscapes
title_short An Efficient and Light Transformer-Based Segmentation Network for Remote Sensing Images of Landscapes
title_sort efficient and light transformer based segmentation network for remote sensing images of landscapes
topic ultra-high-resolution image
segmentation quality
multilevel semantic contexts
transformer
url https://www.mdpi.com/1999-4907/14/11/2271
work_keys_str_mv AT lijiachen anefficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes
AT honghuichen anefficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes
AT yanqiuxie anefficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes
AT tianyouhe anefficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes
AT jingye anefficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes
AT yushanzheng anefficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes
AT lijiachen efficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes
AT honghuichen efficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes
AT yanqiuxie efficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes
AT tianyouhe efficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes
AT jingye efficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes
AT yushanzheng efficientandlighttransformerbasedsegmentationnetworkforremotesensingimagesoflandscapes