Cross-domain robustness in visual place recognition: an adversarial-based domain alignment approach

Visual Place Recognition (VPR) serves as an important component in robotic sensing, mainly utilized within navigation systems, such as autonomous vehicles. VPR enables large-scale localization by comparing current query visual cues within a geo-tagged database of previously visited locations. The ma...

Full description

Bibliographic Details
Main Author: Lin, Yingying
Other Authors: Wang Dan Wei
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175346
_version_ 1811689134335459328
author Lin, Yingying
author2 Wang Dan Wei
author_facet Wang Dan Wei
Lin, Yingying
author_sort Lin, Yingying
collection NTU
description Visual Place Recognition (VPR) serves as an important component in robotic sensing, mainly utilized within navigation systems, such as autonomous vehicles. VPR enables large-scale localization by comparing current query visual cues within a geo-tagged database of previously visited locations. The main approach in VPR is using weak-supervised representation learning to generate compact but discriminative place descriptors, which are then used for image retrieve the closest match location from the database. NetVLAD has achieved high recall performance within datasets sampled from the same distribution, such as when inference is conducted within the same city and conditions. However, real applications may face significant challenges due to environmental changes, such as variations in illumination, viewpoint, and architectural styles. These different image styles could be viewed as samples from another data distribution, or a different domain. A VPR model trained on a source domain datasets may suffer a sharp decline in performance when tested on a different target domain. This dissertation aims to relief this low cross-domain robustness problem and enhance domain generalization capability. This dissertation focuses on improving the widely-used NetVLAD architecture by employing Domain Adaptation strategies to get place-discriminative while domain-invariant descriptors. An exhaustive theoretical exploration of VPR and Domain Adaptation is first conducted, identifying that resolving the inconsistency between geographic supervision and semantic information is key to better place descriptors. Meanwhile, mitigating the impact of domain-specific features relies most on adjusting the clustering center and soft-alignment parameters in the NetVLAD aggregation layer. Based on theoretical research insights, this work would perform domain alignment by introducing a small-scale target domain guidance to add prior information of the target domain during the training process, thus adapting the VPR model to new environments. Inspired by GAN principles to blur domain-specific information and enlightened by Domain Adversarial Neural Networks (DANN), this dissertation proposes three levels of domain alignment: pixel-level, local feature level, and representation level. Each approach is theoretically and empirically analyzed, comparing their advantages and limitations. Experimental outcomes shows that representation-level alignment most effectively meets the research objectives, outperforming both pixel and local feature level alignments. This increase is attributed to its alignment with the essence of representation learning, being highly task-relevant, and directly modifying descriptors, thus successfully enhancing target domain robustness while preserving source domain performance. For some limitations of this modification, the dissertation also gives recommendations for future work, such as enriching dataset information or simplifying model complexity to ensure model generalization ability.
first_indexed 2024-10-01T05:43:16Z
format Thesis-Master by Coursework
id ntu-10356/175346
institution Nanyang Technological University
language English
last_indexed 2024-10-01T05:43:16Z
publishDate 2024
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1753462024-04-26T16:00:18Z Cross-domain robustness in visual place recognition: an adversarial-based domain alignment approach Lin, Yingying Wang Dan Wei School of Electrical and Electronic Engineering EDWWANG@ntu.edu.sg Engineering Visual place recognition Domain adaptation Image retrieval Adversarial network Visual Place Recognition (VPR) serves as an important component in robotic sensing, mainly utilized within navigation systems, such as autonomous vehicles. VPR enables large-scale localization by comparing current query visual cues within a geo-tagged database of previously visited locations. The main approach in VPR is using weak-supervised representation learning to generate compact but discriminative place descriptors, which are then used for image retrieve the closest match location from the database. NetVLAD has achieved high recall performance within datasets sampled from the same distribution, such as when inference is conducted within the same city and conditions. However, real applications may face significant challenges due to environmental changes, such as variations in illumination, viewpoint, and architectural styles. These different image styles could be viewed as samples from another data distribution, or a different domain. A VPR model trained on a source domain datasets may suffer a sharp decline in performance when tested on a different target domain. This dissertation aims to relief this low cross-domain robustness problem and enhance domain generalization capability. This dissertation focuses on improving the widely-used NetVLAD architecture by employing Domain Adaptation strategies to get place-discriminative while domain-invariant descriptors. An exhaustive theoretical exploration of VPR and Domain Adaptation is first conducted, identifying that resolving the inconsistency between geographic supervision and semantic information is key to better place descriptors. Meanwhile, mitigating the impact of domain-specific features relies most on adjusting the clustering center and soft-alignment parameters in the NetVLAD aggregation layer. Based on theoretical research insights, this work would perform domain alignment by introducing a small-scale target domain guidance to add prior information of the target domain during the training process, thus adapting the VPR model to new environments. Inspired by GAN principles to blur domain-specific information and enlightened by Domain Adversarial Neural Networks (DANN), this dissertation proposes three levels of domain alignment: pixel-level, local feature level, and representation level. Each approach is theoretically and empirically analyzed, comparing their advantages and limitations. Experimental outcomes shows that representation-level alignment most effectively meets the research objectives, outperforming both pixel and local feature level alignments. This increase is attributed to its alignment with the essence of representation learning, being highly task-relevant, and directly modifying descriptors, thus successfully enhancing target domain robustness while preserving source domain performance. For some limitations of this modification, the dissertation also gives recommendations for future work, such as enriching dataset information or simplifying model complexity to ensure model generalization ability. Master's degree 2024-04-22T04:38:38Z 2024-04-22T04:38:38Z 2024 Thesis-Master by Coursework Lin, Y. (2024). Cross-domain robustness in visual place recognition: an adversarial-based domain alignment approach. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175346 https://hdl.handle.net/10356/175346 en application/pdf Nanyang Technological University
spellingShingle Engineering
Visual place recognition
Domain adaptation
Image retrieval
Adversarial network
Lin, Yingying
Cross-domain robustness in visual place recognition: an adversarial-based domain alignment approach
title Cross-domain robustness in visual place recognition: an adversarial-based domain alignment approach
title_full Cross-domain robustness in visual place recognition: an adversarial-based domain alignment approach
title_fullStr Cross-domain robustness in visual place recognition: an adversarial-based domain alignment approach
title_full_unstemmed Cross-domain robustness in visual place recognition: an adversarial-based domain alignment approach
title_short Cross-domain robustness in visual place recognition: an adversarial-based domain alignment approach
title_sort cross domain robustness in visual place recognition an adversarial based domain alignment approach
topic Engineering
Visual place recognition
Domain adaptation
Image retrieval
Adversarial network
url https://hdl.handle.net/10356/175346
work_keys_str_mv AT linyingying crossdomainrobustnessinvisualplacerecognitionanadversarialbaseddomainalignmentapproach