On-the-fly visual category search in web-scale image collections

<p>This thesis tackles the problem of large-scale visual search for categories within large collections of images. Given a textual description of a visual category, such as 'car' or 'person', the objective is to retrieve images containing that category from the corpus quick...

Full description

Bibliographic Details
Main Author: Chatfield, K
Other Authors: Zisserman, A
Format: Thesis
Language:English
Published: 2014
Subjects:
_version_ 1797094978693890048
author Chatfield, K
author2 Zisserman, A
author_facet Zisserman, A
Chatfield, K
author_sort Chatfield, K
collection OXFORD
description <p>This thesis tackles the problem of large-scale visual search for categories within large collections of images. Given a textual description of a visual category, such as 'car' or 'person', the objective is to retrieve images containing that category from the corpus quickly and accurately, and without the need for auxiliary meta-data or, crucially and in contrast to previous approaches, expensive pre-training.</p> <p>The general approach to identifying different visual categories within a dataset is to train classifiers over features extracted from a set of training images. The performance of such classifiers relies heavily on sufficiently discriminative image representations, and many methods have been proposed which involve the aggregating of local appearance features into rich bag-of-words encodings. We begin by conducting a comprehensive evaluation of the latest such encodings, identifying best-of-breed practices for training powerful visual models using these representations. We also contrast these methods with the latest breed of Convolutional Network (ConvNet) based features, thus developing a state-of-the-art architecture for large-scale image classification.</p> <p>Following this, we explore how a standard classification pipeline can be adapted for use in a real-time setting. One of the major issues, particularly with bag-of-words based methods, is the high dimensionality of the encodings, which causes ranking over large datasets to be prohibitively expensive. We therefore assess different methods for compressing such features, and further propose a novel cascade approach to ranking which both reduces ranking time and improves retrieval performance.</p> <p>Finally, we explore the problem of training visual models on-the-fly, making use of visual data dynamically collected from the web to train classifiers on demand. On this basis, we develop a novel GPU architecture for on-the-fly visual category search which is capable of retrieving previously unknown categories over unannonated datasets of millions of images in just a few seconds.</p>
first_indexed 2024-03-07T04:21:26Z
format Thesis
id oxford-uuid:cb26c472-b253-4fec-a88e-0c57fc9d70e7
institution University of Oxford
language English
last_indexed 2024-03-07T04:21:26Z
publishDate 2014
record_format dspace
spelling oxford-uuid:cb26c472-b253-4fec-a88e-0c57fc9d70e72022-03-27T07:12:46ZOn-the-fly visual category search in web-scale image collectionsThesishttp://purl.org/coar/resource_type/c_db06uuid:cb26c472-b253-4fec-a88e-0c57fc9d70e7Information engineeringEnglishOxford University Research Archive - Valet2014Chatfield, KZisserman, A<p>This thesis tackles the problem of large-scale visual search for categories within large collections of images. Given a textual description of a visual category, such as 'car' or 'person', the objective is to retrieve images containing that category from the corpus quickly and accurately, and without the need for auxiliary meta-data or, crucially and in contrast to previous approaches, expensive pre-training.</p> <p>The general approach to identifying different visual categories within a dataset is to train classifiers over features extracted from a set of training images. The performance of such classifiers relies heavily on sufficiently discriminative image representations, and many methods have been proposed which involve the aggregating of local appearance features into rich bag-of-words encodings. We begin by conducting a comprehensive evaluation of the latest such encodings, identifying best-of-breed practices for training powerful visual models using these representations. We also contrast these methods with the latest breed of Convolutional Network (ConvNet) based features, thus developing a state-of-the-art architecture for large-scale image classification.</p> <p>Following this, we explore how a standard classification pipeline can be adapted for use in a real-time setting. One of the major issues, particularly with bag-of-words based methods, is the high dimensionality of the encodings, which causes ranking over large datasets to be prohibitively expensive. We therefore assess different methods for compressing such features, and further propose a novel cascade approach to ranking which both reduces ranking time and improves retrieval performance.</p> <p>Finally, we explore the problem of training visual models on-the-fly, making use of visual data dynamically collected from the web to train classifiers on demand. On this basis, we develop a novel GPU architecture for on-the-fly visual category search which is capable of retrieving previously unknown categories over unannonated datasets of millions of images in just a few seconds.</p>
spellingShingle Information engineering
Chatfield, K
On-the-fly visual category search in web-scale image collections
title On-the-fly visual category search in web-scale image collections
title_full On-the-fly visual category search in web-scale image collections
title_fullStr On-the-fly visual category search in web-scale image collections
title_full_unstemmed On-the-fly visual category search in web-scale image collections
title_short On-the-fly visual category search in web-scale image collections
title_sort on the fly visual category search in web scale image collections
topic Information engineering
work_keys_str_mv AT chatfieldk ontheflyvisualcategorysearchinwebscaleimagecollections