Image matching based indoor localization and navigation for mobile users

Different from the sensor based indoor localization approaches, vision-based approach for mobile indoor localization does not rely on the hardware infrastructure and therefore is scalable and inexpensive. Two key technical areas to implement a visual search based indoor navigation system are: 1) eff...

Full description

Bibliographic Details
Main Author: Tao, Qingyi
Other Authors: Cai Jianfei
Format: Final Year Project (FYP)
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/62877
Description
Summary:Different from the sensor based indoor localization approaches, vision-based approach for mobile indoor localization does not rely on the hardware infrastructure and therefore is scalable and inexpensive. Two key technical areas to implement a visual search based indoor navigation system are: 1) efficient and accurate image retrieval capability and 2) 3D model reconstruction with images. This report will discuss the techniques available for each stage of the image retrieval process. Given the former research results on datasets such as paintings and landmarks as the benchmark, the indoor dataset will be experimented to prove the feasibility of visual search based indoor navigation. For feature extraction, the performance of traditional Scale-invariant feature transform (SIFT) [1] descriptor is found to be most stable but it is too slow for indoor navigation. Hence, Block based Frequency Domain Laplacian of Gaussian (BFLog) [2] is then used to improve the traditional detector in SIFT algorithm. For global feature generation, Scalable Compressed Fisher Vector (SCFV) [3] slightly outperforms Bag-of-Words (BoW) [4], Fisher Vector (FV) [5] and Vector of Locally Aggregated Descriptors (VLAD) [6]. However, the precision is not as expected for an indoor navigation system if the retrieved image is purely determined by global feature matching. The top images should be re-ranked with local feature matching to get a good matching accuracy. The precision is tested to be larger than 80% on the indoor dataset. 3D model reconstruction is achieved by creating point cloud with PhotoSynth [7]. The user position is derived by solving the camera pose from the correspondence of reference points and query points. An iOS application is developed based on these visual search methodologies. The interactive user interface with voice input and augmented reality is designed to enhance the user experience.