Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction

Weakly supervised semantic segmentation (WSSS) methods, utilizing only image-level annotations, are gaining popularity for automated building extraction due to their advantages in eliminating the need for costly and time-consuming pixel-level labeling. Class activation maps (CAMs) are crucial for we...

Full description

Bibliographic Details
Main Authors:	Jicheng Wang, Xin Yan, Li Shen, Tian Lan, Xunqiang Gong, Zhilin Li
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Remote Sensing
Subjects:	building extraction high-resolution remote sensing image weakly supervised semantic segmentation self-attentive aggregation class activation map
Online Access:	https://www.mdpi.com/2072-4292/15/5/1432

_version_	1797614467153920000
author	Jicheng Wang Xin Yan Li Shen Tian Lan Xunqiang Gong Zhilin Li
author_facet	Jicheng Wang Xin Yan Li Shen Tian Lan Xunqiang Gong Zhilin Li
author_sort	Jicheng Wang
collection	DOAJ
description	Weakly supervised semantic segmentation (WSSS) methods, utilizing only image-level annotations, are gaining popularity for automated building extraction due to their advantages in eliminating the need for costly and time-consuming pixel-level labeling. Class activation maps (CAMs) are crucial for weakly supervised methods to generate pseudo-pixel-level labels for training networks in semantic segmentation. However, CAMs only activate the most discriminative regions, leading to inaccurate and incomplete results. To alleviate this, we propose a scale-invariant multi-level context aggregation network to improve the quality of CAMs in terms of fineness and completeness. The proposed method has integrated two novel modules into a Siamese network: (a) a self-attentive multi-level context aggregation module that generates and attentively aggregates multi-level CAMs to create fine-structured CAMs and (b) a scale-invariant optimization module that cooperates with mutual learning and coarse-to-fine optimization to improve the completeness of CAMs. The results of the experiments on two open building datasets demonstrate that our method achieves new state-of-the-art building extraction results using only image-level labels, producing more complete and accurate CAMs with an IoU of 0.6339 on the WHU dataset and 0.5887 on the Chicago dataset, respectively.
first_indexed	2024-03-11T07:11:48Z
format	Article
id	doaj.art-b49503f3786f49d5adeea662671a864a
institution	Directory Open Access Journal
issn	2072-4292
language	English
last_indexed	2024-03-11T07:11:48Z
publishDate	2023-03-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj.art-b49503f3786f49d5adeea662671a864a2023-11-17T08:33:06ZengMDPI AGRemote Sensing2072-42922023-03-01155143210.3390/rs15051432Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building ExtractionJicheng Wang0Xin Yan1Li Shen2Tian Lan3Xunqiang Gong4Zhilin Li5Key Laboratory of Land Resources Evaluation and Monitoring in Southwest China of Ministry of Education, Sichuan Normal University, Chengdu 610068, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaKey Laboratory of Mine Environmental Monitoring and Improving around Poyang Lake of Ministry of Natural Resources, East China University of Technology, Nanchang 330013, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaWeakly supervised semantic segmentation (WSSS) methods, utilizing only image-level annotations, are gaining popularity for automated building extraction due to their advantages in eliminating the need for costly and time-consuming pixel-level labeling. Class activation maps (CAMs) are crucial for weakly supervised methods to generate pseudo-pixel-level labels for training networks in semantic segmentation. However, CAMs only activate the most discriminative regions, leading to inaccurate and incomplete results. To alleviate this, we propose a scale-invariant multi-level context aggregation network to improve the quality of CAMs in terms of fineness and completeness. The proposed method has integrated two novel modules into a Siamese network: (a) a self-attentive multi-level context aggregation module that generates and attentively aggregates multi-level CAMs to create fine-structured CAMs and (b) a scale-invariant optimization module that cooperates with mutual learning and coarse-to-fine optimization to improve the completeness of CAMs. The results of the experiments on two open building datasets demonstrate that our method achieves new state-of-the-art building extraction results using only image-level labels, producing more complete and accurate CAMs with an IoU of 0.6339 on the WHU dataset and 0.5887 on the Chicago dataset, respectively.https://www.mdpi.com/2072-4292/15/5/1432building extractionhigh-resolution remote sensing imageweakly supervised semantic segmentationself-attentive aggregationclass activation map
spellingShingle	Jicheng Wang Xin Yan Li Shen Tian Lan Xunqiang Gong Zhilin Li Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction Remote Sensing building extraction high-resolution remote sensing image weakly supervised semantic segmentation self-attentive aggregation class activation map
title	Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
title_full	Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
title_fullStr	Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
title_full_unstemmed	Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
title_short	Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
title_sort	scale invariant multi level context aggregation network for weakly supervised building extraction
topic	building extraction high-resolution remote sensing image weakly supervised semantic segmentation self-attentive aggregation class activation map
url	https://www.mdpi.com/2072-4292/15/5/1432
work_keys_str_mv	AT jichengwang scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction AT xinyan scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction AT lishen scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction AT tianlan scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction AT xunqianggong scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction AT zhilinli scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction

Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction

Similar Items