LiteST-Net: A Hybrid Model of Lite Swin Transformer and Convolution for Building Extraction from Remote Sensing Image

Extracting building data from remote sensing images is an efficient way to obtain geographic information data, especially following the emergence of deep learning technology, which results in the automatic extraction of building data from remote sensing images becoming increasingly accurate. A CNN (...

Full description

Bibliographic Details
Main Authors: Wei Yuan, Xiaobo Zhang, Jibao Shi, Jin Wang
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/8/1996
_version_ 1827743832795512832
author Wei Yuan
Xiaobo Zhang
Jibao Shi
Jin Wang
author_facet Wei Yuan
Xiaobo Zhang
Jibao Shi
Jin Wang
author_sort Wei Yuan
collection DOAJ
description Extracting building data from remote sensing images is an efficient way to obtain geographic information data, especially following the emergence of deep learning technology, which results in the automatic extraction of building data from remote sensing images becoming increasingly accurate. A CNN (convolution neural network) is a successful structure after a fully connected network. It has the characteristics of saving computation and translation invariance with improved local features, but it has difficulty obtaining global features. Transformers can compensate for the shortcomings of CNNs and more effectively obtain global features. However, the calculation number of transformers is excessive. To solve this problem, a Lite Swin transformer is proposed. The three matrices Q, K, and V of the transformer are simplified to only a V matrix, and the v of the pixel is then replaced by the v with the largest projection value on the pixel feature vector. In order to better integrate global features and local features, we propose the LiteST-Net model, in which the features extracted by the Lite Swin transformer and the CNN are added together and then sampled up step by step to fully utilize the global feature acquisition ability of the transformer and the local feature acquisition ability of the CNN. The comparison experiments on two open datasets are carried out using our proposed LiteST-Net and some classical image segmentation models. The results show that compared with other networks, all metrics of LiteST-Net are the best, and the predicted image is closer to the label.
first_indexed 2024-03-11T04:35:04Z
format Article
id doaj.art-8458c3b2f9ba462d86b648c5ba361e8c
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-11T04:35:04Z
publishDate 2023-04-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-8458c3b2f9ba462d86b648c5ba361e8c2023-11-17T21:10:33ZengMDPI AGRemote Sensing2072-42922023-04-01158199610.3390/rs15081996LiteST-Net: A Hybrid Model of Lite Swin Transformer and Convolution for Building Extraction from Remote Sensing ImageWei Yuan0Xiaobo Zhang1Jibao Shi2Jin Wang3College of Computer Science, Chengdu University, Chengdu 610106, ChinaSichuan Urban Informatization Surveying and Mapping Engineering Technology Research Center, Chengdu 610084, ChinaSichuan Urban Informatization Surveying and Mapping Engineering Technology Research Center, Chengdu 610084, ChinaCollege of Computer Science, Chengdu University, Chengdu 610106, ChinaExtracting building data from remote sensing images is an efficient way to obtain geographic information data, especially following the emergence of deep learning technology, which results in the automatic extraction of building data from remote sensing images becoming increasingly accurate. A CNN (convolution neural network) is a successful structure after a fully connected network. It has the characteristics of saving computation and translation invariance with improved local features, but it has difficulty obtaining global features. Transformers can compensate for the shortcomings of CNNs and more effectively obtain global features. However, the calculation number of transformers is excessive. To solve this problem, a Lite Swin transformer is proposed. The three matrices Q, K, and V of the transformer are simplified to only a V matrix, and the v of the pixel is then replaced by the v with the largest projection value on the pixel feature vector. In order to better integrate global features and local features, we propose the LiteST-Net model, in which the features extracted by the Lite Swin transformer and the CNN are added together and then sampled up step by step to fully utilize the global feature acquisition ability of the transformer and the local feature acquisition ability of the CNN. The comparison experiments on two open datasets are carried out using our proposed LiteST-Net and some classical image segmentation models. The results show that compared with other networks, all metrics of LiteST-Net are the best, and the predicted image is closer to the label.https://www.mdpi.com/2072-4292/15/8/1996building extractionLite Swin transformerswin transformerdeep learningremote sensing image
spellingShingle Wei Yuan
Xiaobo Zhang
Jibao Shi
Jin Wang
LiteST-Net: A Hybrid Model of Lite Swin Transformer and Convolution for Building Extraction from Remote Sensing Image
Remote Sensing
building extraction
Lite Swin transformer
swin transformer
deep learning
remote sensing image
title LiteST-Net: A Hybrid Model of Lite Swin Transformer and Convolution for Building Extraction from Remote Sensing Image
title_full LiteST-Net: A Hybrid Model of Lite Swin Transformer and Convolution for Building Extraction from Remote Sensing Image
title_fullStr LiteST-Net: A Hybrid Model of Lite Swin Transformer and Convolution for Building Extraction from Remote Sensing Image
title_full_unstemmed LiteST-Net: A Hybrid Model of Lite Swin Transformer and Convolution for Building Extraction from Remote Sensing Image
title_short LiteST-Net: A Hybrid Model of Lite Swin Transformer and Convolution for Building Extraction from Remote Sensing Image
title_sort litest net a hybrid model of lite swin transformer and convolution for building extraction from remote sensing image
topic building extraction
Lite Swin transformer
swin transformer
deep learning
remote sensing image
url https://www.mdpi.com/2072-4292/15/8/1996
work_keys_str_mv AT weiyuan litestnetahybridmodelofliteswintransformerandconvolutionforbuildingextractionfromremotesensingimage
AT xiaobozhang litestnetahybridmodelofliteswintransformerandconvolutionforbuildingextractionfromremotesensingimage
AT jibaoshi litestnetahybridmodelofliteswintransformerandconvolutionforbuildingextractionfromremotesensingimage
AT jinwang litestnetahybridmodelofliteswintransformerandconvolutionforbuildingextractionfromremotesensingimage