Feature representation and learning methods in visual search applications

This thesis studies and develops various image feature representation and learning methods for visual search applications. We study both handcrafted features as well as deep learning based representations. Handcrafted features based methods are light-weight and do not require large training data. Ho...

Full description

Bibliographic Details
Main Author: Manandhar, Dipu
Other Authors: Yap Kim Hui
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/81689
http://hdl.handle.net/10220/47995
Description
Summary:This thesis studies and develops various image feature representation and learning methods for visual search applications. We study both handcrafted features as well as deep learning based representations. Handcrafted features based methods are light-weight and do not require large training data. However, they have been outperformed by deep learning methods in various vision-related problems in recent years. Nevertheless, deep learning methods generally require large data and computational power. In view of this, this thesis will study both handcrafted methods as well as deep learning methods for two selected domains, namely, visual landmark search and visual fashion search and application. The first application develops algorithms for visual landmark search. The presence of repetitive patterns in landmark images causes visual burstiness issue which adversely affect the image representation. To tackle this, we propose a novel Lattice-Support Repetitive Local Feature Detection (LS-RLF) technique which first effectively detects repetitive patterns present in images and then uses the detection information during the image representation. As the repetitive pattern detection is early-vision problem and requires local features analysis, we develop our algorithms using handcrafted local feature-based representation. We also present a new Feature Repetitiveness Similarity (FRS) metric which quantize the repetitive and unique features independently and match them separately. The FRS metric makes use of information in repetitive patterns to enhance the search while avoiding the visual burstiness issue. Experiments conducted on three benchmark datasets namely, Oxford, Paris and Inria Holidays datasets show the effectiveness of the proposed methods. The second application studies feature representation methods for fashion images using deep learning. We collect a new fashion dataset, NTUBrandFashion (NBF) dataset, with 10K fashion images which are richly annotated with essential elements of fashion: categories, attributes, and brand. We propose a new brand-aware fashion search (BAFS) which takes user brand preference into account. This search method uses a deep feature encoding which leverages on hierarchies of CNN activations to extract rich visual representation from clothing images. The brand-aware re-ranking in BAFS framework further improve the search performance. We also propose a new Attribute-Supervised Metric Learning (ASML) method to learn discriminative embedding from clothing images. This deep metric learning based method incorporates image attribute information to supervise the triplet network training. This serves two purposes: (i) mining of informative triplets and (ii) treating the triplets in a soft-manner based on their importance, which helps in capturing similarity at different levels. Experiments conducted on NBF and DeepFashion datasets show the effectiveness of the proposed method. The third work studies methods for fashion trend analysis and popularity prediction based on visual analysis of clothing images. We develop visual representation methods for fashion images while capturing their trend information to predict their popularity in terms of clickrates. We propose an image-based model and a sequence-based model using deep networks to predict the clickrate of the fashion items. The image-based model uses CNN network to predict the clothing popularity. The sequence-based method uses time-sequence of clothing images which uses CNNs to extract visual features and RNN to model the trend information. To the best of our knowledge, this is the first work to explore visual information for fashion forecasting of individual items. Experiments conducted on a dataset obtained from an online fashion company show promising results for fashion forecasting which outperform the recent comparative method.