On the Robustness and Generalization Ability of Building Footprint Extraction on the Example of SegNet and Mask R-CNN

Building footprint (BFP) extraction focuses on the precise pixel-wise segmentation of buildings from aerial photographs such as satellite images. BFP extraction is an essential task in remote sensing and represents the foundation for many higher-level analysis tasks, such as disaster management, mon...

Full description

Bibliographic Details
Main Authors: Muntaha Sakeena, Eric Stumpe, Miroslav Despotovic, David Koch, Matthias Zeppelzauer
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/8/2135
_version_ 1797603579424407552
author Muntaha Sakeena
Eric Stumpe
Miroslav Despotovic
David Koch
Matthias Zeppelzauer
author_facet Muntaha Sakeena
Eric Stumpe
Miroslav Despotovic
David Koch
Matthias Zeppelzauer
author_sort Muntaha Sakeena
collection DOAJ
description Building footprint (BFP) extraction focuses on the precise pixel-wise segmentation of buildings from aerial photographs such as satellite images. BFP extraction is an essential task in remote sensing and represents the foundation for many higher-level analysis tasks, such as disaster management, monitoring of city development, etc. Building footprint extraction is challenging because buildings can have different sizes, shapes, and appearances both in the same region and in different regions of the world. In addition, effects, such as occlusions, shadows, and bad lighting, have to also be considered and compensated. A rich body of work for BFP extraction has been presented in the literature, and promising research results have been reported on benchmarking datasets. Despite the comprehensive work performed, it is still unclear how robust and generalizable state-of-the-art methods are to different regions, cities, settlement structures, and densities. The purpose of this study is to close this gap by investigating questions on the practical applicability of BFP extraction. In particular, we evaluate the robustness and generalizability of state-of-the-art methods as well as their transfer learning capabilities. Therefore, we investigate in detail two of the most popular deep learning architectures for BFP extraction (i.e., SegNet, an encoder–decoder-based architecture and Mask R-CNN, an object detection architecture) and evaluate them with respect to different aspects on a proprietary high-resolution satellite image dataset as well as on publicly available datasets. Results show that both networks generalize well to new data, new cities, and across cities from different continents. They both benefit from increased training data, especially when this data is from the same distribution (data source) or of comparable resolution. Transfer learning from a data source with different recording parameters is not always beneficial.
first_indexed 2024-03-11T04:34:05Z
format Article
id doaj.art-767c2484afbf44398af54aa94a0a1690
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-11T04:34:05Z
publishDate 2023-04-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-767c2484afbf44398af54aa94a0a16902023-11-17T21:12:32ZengMDPI AGRemote Sensing2072-42922023-04-01158213510.3390/rs15082135On the Robustness and Generalization Ability of Building Footprint Extraction on the Example of SegNet and Mask R-CNNMuntaha Sakeena0Eric Stumpe1Miroslav Despotovic2David Koch3Matthias Zeppelzauer4Institute of Creative Media Technologies, St. Pölten University of Applied Sciences, Campus-Platz 1, A-3100 St. Pölten, AustriaInstitute of Creative Media Technologies, St. Pölten University of Applied Sciences, Campus-Platz 1, A-3100 St. Pölten, AustriaInstitute for Energy, Facility & Real Estate Management, University of Applied Sciences Kufstein Tirol, Andreas Hofer-Strasse 7, A-6330 Kufstein, AustriaInstitute for Energy, Facility & Real Estate Management, University of Applied Sciences Kufstein Tirol, Andreas Hofer-Strasse 7, A-6330 Kufstein, AustriaInstitute of Creative Media Technologies, St. Pölten University of Applied Sciences, Campus-Platz 1, A-3100 St. Pölten, AustriaBuilding footprint (BFP) extraction focuses on the precise pixel-wise segmentation of buildings from aerial photographs such as satellite images. BFP extraction is an essential task in remote sensing and represents the foundation for many higher-level analysis tasks, such as disaster management, monitoring of city development, etc. Building footprint extraction is challenging because buildings can have different sizes, shapes, and appearances both in the same region and in different regions of the world. In addition, effects, such as occlusions, shadows, and bad lighting, have to also be considered and compensated. A rich body of work for BFP extraction has been presented in the literature, and promising research results have been reported on benchmarking datasets. Despite the comprehensive work performed, it is still unclear how robust and generalizable state-of-the-art methods are to different regions, cities, settlement structures, and densities. The purpose of this study is to close this gap by investigating questions on the practical applicability of BFP extraction. In particular, we evaluate the robustness and generalizability of state-of-the-art methods as well as their transfer learning capabilities. Therefore, we investigate in detail two of the most popular deep learning architectures for BFP extraction (i.e., SegNet, an encoder–decoder-based architecture and Mask R-CNN, an object detection architecture) and evaluate them with respect to different aspects on a proprietary high-resolution satellite image dataset as well as on publicly available datasets. Results show that both networks generalize well to new data, new cities, and across cities from different continents. They both benefit from increased training data, especially when this data is from the same distribution (data source) or of comparable resolution. Transfer learning from a data source with different recording parameters is not always beneficial.https://www.mdpi.com/2072-4292/15/8/2135building footprint extractionsatellite image segmentationbuilding detectioncomparative studyrobustness evaluation
spellingShingle Muntaha Sakeena
Eric Stumpe
Miroslav Despotovic
David Koch
Matthias Zeppelzauer
On the Robustness and Generalization Ability of Building Footprint Extraction on the Example of SegNet and Mask R-CNN
Remote Sensing
building footprint extraction
satellite image segmentation
building detection
comparative study
robustness evaluation
title On the Robustness and Generalization Ability of Building Footprint Extraction on the Example of SegNet and Mask R-CNN
title_full On the Robustness and Generalization Ability of Building Footprint Extraction on the Example of SegNet and Mask R-CNN
title_fullStr On the Robustness and Generalization Ability of Building Footprint Extraction on the Example of SegNet and Mask R-CNN
title_full_unstemmed On the Robustness and Generalization Ability of Building Footprint Extraction on the Example of SegNet and Mask R-CNN
title_short On the Robustness and Generalization Ability of Building Footprint Extraction on the Example of SegNet and Mask R-CNN
title_sort on the robustness and generalization ability of building footprint extraction on the example of segnet and mask r cnn
topic building footprint extraction
satellite image segmentation
building detection
comparative study
robustness evaluation
url https://www.mdpi.com/2072-4292/15/8/2135
work_keys_str_mv AT muntahasakeena ontherobustnessandgeneralizationabilityofbuildingfootprintextractionontheexampleofsegnetandmaskrcnn
AT ericstumpe ontherobustnessandgeneralizationabilityofbuildingfootprintextractionontheexampleofsegnetandmaskrcnn
AT miroslavdespotovic ontherobustnessandgeneralizationabilityofbuildingfootprintextractionontheexampleofsegnetandmaskrcnn
AT davidkoch ontherobustnessandgeneralizationabilityofbuildingfootprintextractionontheexampleofsegnetandmaskrcnn
AT matthiaszeppelzauer ontherobustnessandgeneralizationabilityofbuildingfootprintextractionontheexampleofsegnetandmaskrcnn