Representation of spatial transformations in deep neural networks

<p>This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations e...

Full description

Bibliographic Details
Main Author: Lenc, K
Other Authors: Vedaldi, A
Format: Thesis
Language:English
Published: 2017
Subjects:
_version_ 1826283158599892992
author Lenc, K
author2 Vedaldi, A
author_facet Vedaldi, A
Lenc, K
author_sort Lenc, K
collection OXFORD
description <p>This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations empirically and to apply deep learning to new computer vision tasks where the underlying spatial information is of importance. The results help to further the understanding of modern computer vision representations, such as convolutional neural networks (CNNs) in image classification and object detection and to enable their application to new domains such as local feature detection.</p> <p>Because our theoretical understanding of CNNs remains limited, we investigate two key mathematical properties of representations: equivariance (how transformations of the input image are encoded) and equivalence (how two representations, for example two different parameterizations, layers or architectures share the same visual information). A number of methods to establish these properties empirically are proposed. These methods reveal interesting aspects of their structure, including clarifying at which layers in a CNN geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility. Direct applications to structured-output regression are demonstrated as well.</p> <p>Local covariant feature detection has been difficult to approach with machine learning techniques. We propose the first fully general formulation for learning local covariant feature detectors which casts detection as a regression problem, enabling the use of powerful regressors such as deep neural networks. The derived <em>covariance constraint</em> can be used to automatically learn which visual structures provide stable anchors for local feature detection. We support these ideas theoretically, and show that existing detectors can be derived in this framework. Additionally, in cooperation with Imperial College London, we introduce a novel large-scale dataset for evaluation of local detectors and descriptors. It is suitable for training and testing modern local features, together with strictly defined evaluation protocols for descriptors in several tasks such as matching, retrieval and verification.</p> <p>The importance of pixel-wise image geometry for object detection is unknown as the best results used to be obtained with combination of CNNs with cues from image segmentation. We propose a detector which uses constant region proposals and, while it approximates objects poorly, we show that a bounding box regressor using intermediate convolutional features can recover sufficiently accurate bounding boxes, demonstrating that the required geometric information is contained in the CNN itself. Combined with other improvements, we obtain an excellent and fast detector that processes an image only with the CNN.</p>
first_indexed 2024-03-07T00:54:42Z
format Thesis
id oxford-uuid:87a16dc2-9d77-49c3-8096-cf3416fa6893
institution University of Oxford
language English
last_indexed 2024-03-07T00:54:42Z
publishDate 2017
record_format dspace
spelling oxford-uuid:87a16dc2-9d77-49c3-8096-cf3416fa68932022-03-26T22:11:56ZRepresentation of spatial transformations in deep neural networksThesishttp://purl.org/coar/resource_type/c_db06uuid:87a16dc2-9d77-49c3-8096-cf3416fa6893Computer visionEnglishORA Deposit2017Lenc, KVedaldi, AZisserman, ALepetit, V<p>This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations empirically and to apply deep learning to new computer vision tasks where the underlying spatial information is of importance. The results help to further the understanding of modern computer vision representations, such as convolutional neural networks (CNNs) in image classification and object detection and to enable their application to new domains such as local feature detection.</p> <p>Because our theoretical understanding of CNNs remains limited, we investigate two key mathematical properties of representations: equivariance (how transformations of the input image are encoded) and equivalence (how two representations, for example two different parameterizations, layers or architectures share the same visual information). A number of methods to establish these properties empirically are proposed. These methods reveal interesting aspects of their structure, including clarifying at which layers in a CNN geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility. Direct applications to structured-output regression are demonstrated as well.</p> <p>Local covariant feature detection has been difficult to approach with machine learning techniques. We propose the first fully general formulation for learning local covariant feature detectors which casts detection as a regression problem, enabling the use of powerful regressors such as deep neural networks. The derived <em>covariance constraint</em> can be used to automatically learn which visual structures provide stable anchors for local feature detection. We support these ideas theoretically, and show that existing detectors can be derived in this framework. Additionally, in cooperation with Imperial College London, we introduce a novel large-scale dataset for evaluation of local detectors and descriptors. It is suitable for training and testing modern local features, together with strictly defined evaluation protocols for descriptors in several tasks such as matching, retrieval and verification.</p> <p>The importance of pixel-wise image geometry for object detection is unknown as the best results used to be obtained with combination of CNNs with cues from image segmentation. We propose a detector which uses constant region proposals and, while it approximates objects poorly, we show that a bounding box regressor using intermediate convolutional features can recover sufficiently accurate bounding boxes, demonstrating that the required geometric information is contained in the CNN itself. Combined with other improvements, we obtain an excellent and fast detector that processes an image only with the CNN.</p>
spellingShingle Computer vision
Lenc, K
Representation of spatial transformations in deep neural networks
title Representation of spatial transformations in deep neural networks
title_full Representation of spatial transformations in deep neural networks
title_fullStr Representation of spatial transformations in deep neural networks
title_full_unstemmed Representation of spatial transformations in deep neural networks
title_short Representation of spatial transformations in deep neural networks
title_sort representation of spatial transformations in deep neural networks
topic Computer vision
work_keys_str_mv AT lenck representationofspatialtransformationsindeepneuralnetworks