Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction

Weakly supervised semantic segmentation (WSSS) methods, utilizing only image-level annotations, are gaining popularity for automated building extraction due to their advantages in eliminating the need for costly and time-consuming pixel-level labeling. Class activation maps (CAMs) are crucial for we...

Full description

Bibliographic Details
Main Authors: Jicheng Wang, Xin Yan, Li Shen, Tian Lan, Xunqiang Gong, Zhilin Li
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/5/1432
_version_ 1797614467153920000
author Jicheng Wang
Xin Yan
Li Shen
Tian Lan
Xunqiang Gong
Zhilin Li
author_facet Jicheng Wang
Xin Yan
Li Shen
Tian Lan
Xunqiang Gong
Zhilin Li
author_sort Jicheng Wang
collection DOAJ
description Weakly supervised semantic segmentation (WSSS) methods, utilizing only image-level annotations, are gaining popularity for automated building extraction due to their advantages in eliminating the need for costly and time-consuming pixel-level labeling. Class activation maps (CAMs) are crucial for weakly supervised methods to generate pseudo-pixel-level labels for training networks in semantic segmentation. However, CAMs only activate the most discriminative regions, leading to inaccurate and incomplete results. To alleviate this, we propose a scale-invariant multi-level context aggregation network to improve the quality of CAMs in terms of fineness and completeness. The proposed method has integrated two novel modules into a Siamese network: (a) a self-attentive multi-level context aggregation module that generates and attentively aggregates multi-level CAMs to create fine-structured CAMs and (b) a scale-invariant optimization module that cooperates with mutual learning and coarse-to-fine optimization to improve the completeness of CAMs. The results of the experiments on two open building datasets demonstrate that our method achieves new state-of-the-art building extraction results using only image-level labels, producing more complete and accurate CAMs with an IoU of 0.6339 on the WHU dataset and 0.5887 on the Chicago dataset, respectively.
first_indexed 2024-03-11T07:11:48Z
format Article
id doaj.art-b49503f3786f49d5adeea662671a864a
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-11T07:11:48Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-b49503f3786f49d5adeea662671a864a2023-11-17T08:33:06ZengMDPI AGRemote Sensing2072-42922023-03-01155143210.3390/rs15051432Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building ExtractionJicheng Wang0Xin Yan1Li Shen2Tian Lan3Xunqiang Gong4Zhilin Li5Key Laboratory of Land Resources Evaluation and Monitoring in Southwest China of Ministry of Education, Sichuan Normal University, Chengdu 610068, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaKey Laboratory of Mine Environmental Monitoring and Improving around Poyang Lake of Ministry of Natural Resources, East China University of Technology, Nanchang 330013, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaWeakly supervised semantic segmentation (WSSS) methods, utilizing only image-level annotations, are gaining popularity for automated building extraction due to their advantages in eliminating the need for costly and time-consuming pixel-level labeling. Class activation maps (CAMs) are crucial for weakly supervised methods to generate pseudo-pixel-level labels for training networks in semantic segmentation. However, CAMs only activate the most discriminative regions, leading to inaccurate and incomplete results. To alleviate this, we propose a scale-invariant multi-level context aggregation network to improve the quality of CAMs in terms of fineness and completeness. The proposed method has integrated two novel modules into a Siamese network: (a) a self-attentive multi-level context aggregation module that generates and attentively aggregates multi-level CAMs to create fine-structured CAMs and (b) a scale-invariant optimization module that cooperates with mutual learning and coarse-to-fine optimization to improve the completeness of CAMs. The results of the experiments on two open building datasets demonstrate that our method achieves new state-of-the-art building extraction results using only image-level labels, producing more complete and accurate CAMs with an IoU of 0.6339 on the WHU dataset and 0.5887 on the Chicago dataset, respectively.https://www.mdpi.com/2072-4292/15/5/1432building extractionhigh-resolution remote sensing imageweakly supervised semantic segmentationself-attentive aggregationclass activation map
spellingShingle Jicheng Wang
Xin Yan
Li Shen
Tian Lan
Xunqiang Gong
Zhilin Li
Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
Remote Sensing
building extraction
high-resolution remote sensing image
weakly supervised semantic segmentation
self-attentive aggregation
class activation map
title Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
title_full Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
title_fullStr Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
title_full_unstemmed Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
title_short Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
title_sort scale invariant multi level context aggregation network for weakly supervised building extraction
topic building extraction
high-resolution remote sensing image
weakly supervised semantic segmentation
self-attentive aggregation
class activation map
url https://www.mdpi.com/2072-4292/15/5/1432
work_keys_str_mv AT jichengwang scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction
AT xinyan scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction
AT lishen scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction
AT tianlan scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction
AT xunqianggong scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction
AT zhilinli scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction