Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction
Weakly supervised semantic segmentation (WSSS) methods, utilizing only image-level annotations, are gaining popularity for automated building extraction due to their advantages in eliminating the need for costly and time-consuming pixel-level labeling. Class activation maps (CAMs) are crucial for we...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/15/5/1432 |
_version_ | 1797614467153920000 |
---|---|
author | Jicheng Wang Xin Yan Li Shen Tian Lan Xunqiang Gong Zhilin Li |
author_facet | Jicheng Wang Xin Yan Li Shen Tian Lan Xunqiang Gong Zhilin Li |
author_sort | Jicheng Wang |
collection | DOAJ |
description | Weakly supervised semantic segmentation (WSSS) methods, utilizing only image-level annotations, are gaining popularity for automated building extraction due to their advantages in eliminating the need for costly and time-consuming pixel-level labeling. Class activation maps (CAMs) are crucial for weakly supervised methods to generate pseudo-pixel-level labels for training networks in semantic segmentation. However, CAMs only activate the most discriminative regions, leading to inaccurate and incomplete results. To alleviate this, we propose a scale-invariant multi-level context aggregation network to improve the quality of CAMs in terms of fineness and completeness. The proposed method has integrated two novel modules into a Siamese network: (a) a self-attentive multi-level context aggregation module that generates and attentively aggregates multi-level CAMs to create fine-structured CAMs and (b) a scale-invariant optimization module that cooperates with mutual learning and coarse-to-fine optimization to improve the completeness of CAMs. The results of the experiments on two open building datasets demonstrate that our method achieves new state-of-the-art building extraction results using only image-level labels, producing more complete and accurate CAMs with an IoU of 0.6339 on the WHU dataset and 0.5887 on the Chicago dataset, respectively. |
first_indexed | 2024-03-11T07:11:48Z |
format | Article |
id | doaj.art-b49503f3786f49d5adeea662671a864a |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-03-11T07:11:48Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-b49503f3786f49d5adeea662671a864a2023-11-17T08:33:06ZengMDPI AGRemote Sensing2072-42922023-03-01155143210.3390/rs15051432Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building ExtractionJicheng Wang0Xin Yan1Li Shen2Tian Lan3Xunqiang Gong4Zhilin Li5Key Laboratory of Land Resources Evaluation and Monitoring in Southwest China of Ministry of Education, Sichuan Normal University, Chengdu 610068, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaKey Laboratory of Mine Environmental Monitoring and Improving around Poyang Lake of Ministry of Natural Resources, East China University of Technology, Nanchang 330013, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaWeakly supervised semantic segmentation (WSSS) methods, utilizing only image-level annotations, are gaining popularity for automated building extraction due to their advantages in eliminating the need for costly and time-consuming pixel-level labeling. Class activation maps (CAMs) are crucial for weakly supervised methods to generate pseudo-pixel-level labels for training networks in semantic segmentation. However, CAMs only activate the most discriminative regions, leading to inaccurate and incomplete results. To alleviate this, we propose a scale-invariant multi-level context aggregation network to improve the quality of CAMs in terms of fineness and completeness. The proposed method has integrated two novel modules into a Siamese network: (a) a self-attentive multi-level context aggregation module that generates and attentively aggregates multi-level CAMs to create fine-structured CAMs and (b) a scale-invariant optimization module that cooperates with mutual learning and coarse-to-fine optimization to improve the completeness of CAMs. The results of the experiments on two open building datasets demonstrate that our method achieves new state-of-the-art building extraction results using only image-level labels, producing more complete and accurate CAMs with an IoU of 0.6339 on the WHU dataset and 0.5887 on the Chicago dataset, respectively.https://www.mdpi.com/2072-4292/15/5/1432building extractionhigh-resolution remote sensing imageweakly supervised semantic segmentationself-attentive aggregationclass activation map |
spellingShingle | Jicheng Wang Xin Yan Li Shen Tian Lan Xunqiang Gong Zhilin Li Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction Remote Sensing building extraction high-resolution remote sensing image weakly supervised semantic segmentation self-attentive aggregation class activation map |
title | Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction |
title_full | Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction |
title_fullStr | Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction |
title_full_unstemmed | Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction |
title_short | Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction |
title_sort | scale invariant multi level context aggregation network for weakly supervised building extraction |
topic | building extraction high-resolution remote sensing image weakly supervised semantic segmentation self-attentive aggregation class activation map |
url | https://www.mdpi.com/2072-4292/15/5/1432 |
work_keys_str_mv | AT jichengwang scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction AT xinyan scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction AT lishen scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction AT tianlan scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction AT xunqianggong scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction AT zhilinli scaleinvariantmultilevelcontextaggregationnetworkforweaklysupervisedbuildingextraction |