Representation of spatial transformations in deep neural networks

<p>This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations e...

Full description

Bibliographic Details
Main Author:	Lenc, K
Other Authors:	Vedaldi, A
Format:	Thesis
Language:	English
Published:	2017
Subjects:	Computer vision

_version_	1826283158599892992
author	Lenc, K
author2	Vedaldi, A
author_facet	Vedaldi, A Lenc, K
author_sort	Lenc, K
collection	OXFORD
description	<p>This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations empirically and to apply deep learning to new computer vision tasks where the underlying spatial information is of importance. The results help to further the understanding of modern computer vision representations, such as convolutional neural networks (CNNs) in image classification and object detection and to enable their application to new domains such as local feature detection.</p> <p>Because our theoretical understanding of CNNs remains limited, we investigate two key mathematical properties of representations: equivariance (how transformations of the input image are encoded) and equivalence (how two representations, for example two different parameterizations, layers or architectures share the same visual information). A number of methods to establish these properties empirically are proposed. These methods reveal interesting aspects of their structure, including clarifying at which layers in a CNN geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility. Direct applications to structured-output regression are demonstrated as well.</p> <p>Local covariant feature detection has been difficult to approach with machine learning techniques. We propose the first fully general formulation for learning local covariant feature detectors which casts detection as a regression problem, enabling the use of powerful regressors such as deep neural networks. The derived <em>covariance constraint</em> can be used to automatically learn which visual structures provide stable anchors for local feature detection. We support these ideas theoretically, and show that existing detectors can be derived in this framework. Additionally, in cooperation with Imperial College London, we introduce a novel large-scale dataset for evaluation of local detectors and descriptors. It is suitable for training and testing modern local features, together with strictly defined evaluation protocols for descriptors in several tasks such as matching, retrieval and verification.</p> <p>The importance of pixel-wise image geometry for object detection is unknown as the best results used to be obtained with combination of CNNs with cues from image segmentation. We propose a detector which uses constant region proposals and, while it approximates objects poorly, we show that a bounding box regressor using intermediate convolutional features can recover sufficiently accurate bounding boxes, demonstrating that the required geometric information is contained in the CNN itself. Combined with other improvements, we obtain an excellent and fast detector that processes an image only with the CNN.</p>
first_indexed	2024-03-07T00:54:42Z
format	Thesis
id	oxford-uuid:87a16dc2-9d77-49c3-8096-cf3416fa6893
institution	University of Oxford
language	English
last_indexed	2024-03-07T00:54:42Z
publishDate	2017
record_format	dspace
spelling	oxford-uuid:87a16dc2-9d77-49c3-8096-cf3416fa68932022-03-26T22:11:56ZRepresentation of spatial transformations in deep neural networksThesishttp://purl.org/coar/resource_type/c_db06uuid:87a16dc2-9d77-49c3-8096-cf3416fa6893Computer visionEnglishORA Deposit2017Lenc, KVedaldi, AZisserman, ALepetit, V<p>This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations empirically and to apply deep learning to new computer vision tasks where the underlying spatial information is of importance. The results help to further the understanding of modern computer vision representations, such as convolutional neural networks (CNNs) in image classification and object detection and to enable their application to new domains such as local feature detection.</p> <p>Because our theoretical understanding of CNNs remains limited, we investigate two key mathematical properties of representations: equivariance (how transformations of the input image are encoded) and equivalence (how two representations, for example two different parameterizations, layers or architectures share the same visual information). A number of methods to establish these properties empirically are proposed. These methods reveal interesting aspects of their structure, including clarifying at which layers in a CNN geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility. Direct applications to structured-output regression are demonstrated as well.</p> <p>Local covariant feature detection has been difficult to approach with machine learning techniques. We propose the first fully general formulation for learning local covariant feature detectors which casts detection as a regression problem, enabling the use of powerful regressors such as deep neural networks. The derived <em>covariance constraint</em> can be used to automatically learn which visual structures provide stable anchors for local feature detection. We support these ideas theoretically, and show that existing detectors can be derived in this framework. Additionally, in cooperation with Imperial College London, we introduce a novel large-scale dataset for evaluation of local detectors and descriptors. It is suitable for training and testing modern local features, together with strictly defined evaluation protocols for descriptors in several tasks such as matching, retrieval and verification.</p> <p>The importance of pixel-wise image geometry for object detection is unknown as the best results used to be obtained with combination of CNNs with cues from image segmentation. We propose a detector which uses constant region proposals and, while it approximates objects poorly, we show that a bounding box regressor using intermediate convolutional features can recover sufficiently accurate bounding boxes, demonstrating that the required geometric information is contained in the CNN itself. Combined with other improvements, we obtain an excellent and fast detector that processes an image only with the CNN.</p>
spellingShingle	Computer vision Lenc, K Representation of spatial transformations in deep neural networks
title	Representation of spatial transformations in deep neural networks
title_full	Representation of spatial transformations in deep neural networks
title_fullStr	Representation of spatial transformations in deep neural networks
title_full_unstemmed	Representation of spatial transformations in deep neural networks
title_short	Representation of spatial transformations in deep neural networks
title_sort	representation of spatial transformations in deep neural networks
topic	Computer vision
work_keys_str_mv	AT lenck representationofspatialtransformationsindeepneuralnetworks

Representation of spatial transformations in deep neural networks

Similar Items