City-scale Cross-view Geolocalization with Generalization to Unseen Environments

Cross-view geolocalization, a supplement or replacement for GPS, provides an estimate of an agent's global position by matching a local ground image to an overhead satellite image. It is challenging to reliably match these two sets of images in part because they have significantly different vie...

Full description

Bibliographic Details
Main Author: Downes, Lena M.
Other Authors: How, Jonathan P.
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/153793
https://orcid.org/0000-0001-5753-0617
_version_ 1826198816136626176
author Downes, Lena M.
author2 How, Jonathan P.
author_facet How, Jonathan P.
Downes, Lena M.
author_sort Downes, Lena M.
collection MIT
description Cross-view geolocalization, a supplement or replacement for GPS, provides an estimate of an agent's global position by matching a local ground image to an overhead satellite image. It is challenging to reliably match these two sets of images in part because they have significantly different viewpoints. Existing works have demonstrated geolocalization in constrained scenarios over small areas using panoramic cameras, yielding methods that have limited generalization to unseen environments or conditions and that do not quantify uncertainty. This thesis details Wide-Area Geolocalization (WAG) and Restricted FOV Wide-Area Geolocalization (ReWAG) that combine a neural network with a particle filter to achieve global position estimates for a moving agent in a GPS-denied environment while scaling efficiently to city-sized regions in unseen environments and working with either panoramic or non-panoramic cameras. One contribution is a trinomial loss function that enables accurate and computation-efficient localization across city-scale search areas of nearly 300 km^2 in size by improving image retrieval on the off-center image pairs that result from a coarsely discretized satellite image database. Another contribution is a computationally efficient method to incorporate pose information with input image pairs, which improves localization accuracy with non-panoramic cameras and off-center ground images. An additional contribution is the GKL uncertainty measure for localization outputs, which enables detection of particle filter false convergence through characterization of the particle distribution. The final contribution is a demonstration of ReWAG's ability to generalize across different times of day, seasons, weather, and cameras on data collected from a moving car in Cambridge, Massachusetts, as well as the public release of a challenging imagery dataset collected on this vehicle platform. WAG and ReWAG localize from over 1 km to less than 100 m of localization error while performing particle filter updates with less than 1% of the computation required for previous approaches.
first_indexed 2024-09-23T11:10:06Z
format Thesis
id mit-1721.1/153793
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T11:10:06Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1537932024-03-16T03:43:21Z City-scale Cross-view Geolocalization with Generalization to Unseen Environments Downes, Lena M. How, Jonathan P. Steiner III, Theodore J. Massachusetts Institute of Technology. Department of Aeronautics and Astronautics Cross-view geolocalization, a supplement or replacement for GPS, provides an estimate of an agent's global position by matching a local ground image to an overhead satellite image. It is challenging to reliably match these two sets of images in part because they have significantly different viewpoints. Existing works have demonstrated geolocalization in constrained scenarios over small areas using panoramic cameras, yielding methods that have limited generalization to unseen environments or conditions and that do not quantify uncertainty. This thesis details Wide-Area Geolocalization (WAG) and Restricted FOV Wide-Area Geolocalization (ReWAG) that combine a neural network with a particle filter to achieve global position estimates for a moving agent in a GPS-denied environment while scaling efficiently to city-sized regions in unseen environments and working with either panoramic or non-panoramic cameras. One contribution is a trinomial loss function that enables accurate and computation-efficient localization across city-scale search areas of nearly 300 km^2 in size by improving image retrieval on the off-center image pairs that result from a coarsely discretized satellite image database. Another contribution is a computationally efficient method to incorporate pose information with input image pairs, which improves localization accuracy with non-panoramic cameras and off-center ground images. An additional contribution is the GKL uncertainty measure for localization outputs, which enables detection of particle filter false convergence through characterization of the particle distribution. The final contribution is a demonstration of ReWAG's ability to generalize across different times of day, seasons, weather, and cameras on data collected from a moving car in Cambridge, Massachusetts, as well as the public release of a challenging imagery dataset collected on this vehicle platform. WAG and ReWAG localize from over 1 km to less than 100 m of localization error while performing particle filter updates with less than 1% of the computation required for previous approaches. Ph.D. 2024-03-15T19:24:25Z 2024-03-15T19:24:25Z 2024-02 2024-02-16T20:55:50.865Z Thesis https://hdl.handle.net/1721.1/153793 https://orcid.org/0000-0001-5753-0617 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Downes, Lena M.
City-scale Cross-view Geolocalization with Generalization to Unseen Environments
title City-scale Cross-view Geolocalization with Generalization to Unseen Environments
title_full City-scale Cross-view Geolocalization with Generalization to Unseen Environments
title_fullStr City-scale Cross-view Geolocalization with Generalization to Unseen Environments
title_full_unstemmed City-scale Cross-view Geolocalization with Generalization to Unseen Environments
title_short City-scale Cross-view Geolocalization with Generalization to Unseen Environments
title_sort city scale cross view geolocalization with generalization to unseen environments
url https://hdl.handle.net/1721.1/153793
https://orcid.org/0000-0001-5753-0617
work_keys_str_mv AT downeslenam cityscalecrossviewgeolocalizationwithgeneralizationtounseenenvironments