From matrix factorisation to signal propagation in deep learning: algorithms and guarantees

<p>Many problems in data science amount to computing an appropriate representation of the data for the task at hand: two examples related to this thesis are 1) reducing memory, sensing or transmission costs by computing a sparse representation of the data and 2) performing classification by tr...

Full description

Bibliographic Details
Main Author: Murray, M
Other Authors: Tanner, J
Format: Thesis
Language:English
Published: 2021
Subjects:
_version_ 1817933093274648576
author Murray, M
author2 Tanner, J
author_facet Tanner, J
Murray, M
author_sort Murray, M
collection OXFORD
description <p>Many problems in data science amount to computing an appropriate representation of the data for the task at hand: two examples related to this thesis are 1) reducing memory, sensing or transmission costs by computing a sparse representation of the data and 2) performing classification by transforming the data into a representation in which the members of different classes are linearly separable. The unifying theme of this thesis is the design and analysis of algorithms which are guaranteed, at least with high probability, to compute useful representations. We study this in two contexts: first a matrix factorisation problem inspired by compressed sensing, and second signal propagation in deep learning. In the first context we seek to recover sparse representations of the data from a set of linear measurements without access to the underlying sensing mechanism. In the second context we analyse both the forward and backward pass of deep neural networks, investigating how the choice of activation function impacts the sequences of representations and error vectors that these generate respectively. In Chapter 1 we provide the necessary background to understand the contributions of this thesis, which we now summarise.</p> <p>In Chapter 2 we study the problem of recovering the factors A and X from Y=AX, where A is an m by n sparse, binary matrix with a fixed number d nonzeros per column, and X is an n by N sparse real matrix whose columns have at most k nonzeros and are dissociated. Matrices defined by this factorisation can be expressed as a sum of n rank one sparse matrices, whose nonzero entries, under the appropriate permutation, form dense rectangles or blocks with a constant value in each column. We therefore refer to them as Permuted Striped Block (PSB) matrices. Motivated by multimeasurement vector combinatorial compressed sensing (MMV-CCS), we define the PSB model as a particular distribution over this set of matrices. For matrices drawn from the PSB model, we provide a computationally efficient factorisation algorithm which recovers the generating factors with high probability in n from a near optimal number of vectors. We demonstrate experimentally the efficacy of this algorithm in practice and highlight potential applications: in particular, instead of requiring full access to the encoder matrix, our results show that in the MMV-CCS setting sparse recovery is possible using only knowledge of the distribution from which the encoder matrix is sampled.</p> <p>In Chapter 3 we continue our study of sparse representations, but consider them instead in the context of deep convolutional neural networks (DCNNs). In particular, we investigate the impact of ReLU and related sparsifying activation functions on signal propagation by studying the representations and activation patterns computed by the forward pass at each layer. To this end we consider a variant of the approach proposed by Papyan et al, which interprets the forward pass of a DCNN as solving a sequence of sparse coding problems - we hence refer to this approach as Deep Convolutional Sparse Coding (DCSC). The benefit of this approach is that by assuming a particular data model based on the notion of reversible networks, and by omitting the learning of parameters, one can more transparently analyze the role of the activation function and the efficacy of the forward pass of a DCNN in recovering activation pathways. In the prior work of Papyan et al the authors proved that representations with an activation density proportional to the ambient dimension of the data are recoverable. We extend these uniform guarantees and prove with high probability that representations with a far greater density of activations per layer are recoverable.</p> <p>Finally, in Chapter 4 we continue our investigation on the impact of the choice of activation function on signal propagation in deep, if not convolutional, neural networks. However, diverging markedly from Chapter 3 we forgo any assumptions on the data and instead consider random networks, which are of particular relevance for studying the properties of networks at initialisation. In this chapter we prove that certain problems at initialisation, which stop or hinder subsequent training, can be avoided by ensuring that the activation function deployed has a sufficiently large linear region around the origin. We further demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms of test and training accuracy as well as training time. In addition, these results also allow us to successfully train networks in new hyperparameter regimes, with much larger bias variances than have been used before. </p>
first_indexed 2024-03-06T21:58:36Z
format Thesis
id oxford-uuid:4dced108-3635-407d-bc39-976498b7fa3b
institution University of Oxford
language English
last_indexed 2024-12-09T03:48:19Z
publishDate 2021
record_format dspace
spelling oxford-uuid:4dced108-3635-407d-bc39-976498b7fa3b2024-12-08T11:18:19ZFrom matrix factorisation to signal propagation in deep learning: algorithms and guaranteesThesishttp://purl.org/coar/resource_type/c_db06uuid:4dced108-3635-407d-bc39-976498b7fa3bNeural networks (Computer science)Compressed sensing (Telecommunication)Matrix factorisationApplied mathematicsEnglishHyrax Deposit2021Murray, MTanner, JLambiotte, R<p>Many problems in data science amount to computing an appropriate representation of the data for the task at hand: two examples related to this thesis are 1) reducing memory, sensing or transmission costs by computing a sparse representation of the data and 2) performing classification by transforming the data into a representation in which the members of different classes are linearly separable. The unifying theme of this thesis is the design and analysis of algorithms which are guaranteed, at least with high probability, to compute useful representations. We study this in two contexts: first a matrix factorisation problem inspired by compressed sensing, and second signal propagation in deep learning. In the first context we seek to recover sparse representations of the data from a set of linear measurements without access to the underlying sensing mechanism. In the second context we analyse both the forward and backward pass of deep neural networks, investigating how the choice of activation function impacts the sequences of representations and error vectors that these generate respectively. In Chapter 1 we provide the necessary background to understand the contributions of this thesis, which we now summarise.</p> <p>In Chapter 2 we study the problem of recovering the factors A and X from Y=AX, where A is an m by n sparse, binary matrix with a fixed number d nonzeros per column, and X is an n by N sparse real matrix whose columns have at most k nonzeros and are dissociated. Matrices defined by this factorisation can be expressed as a sum of n rank one sparse matrices, whose nonzero entries, under the appropriate permutation, form dense rectangles or blocks with a constant value in each column. We therefore refer to them as Permuted Striped Block (PSB) matrices. Motivated by multimeasurement vector combinatorial compressed sensing (MMV-CCS), we define the PSB model as a particular distribution over this set of matrices. For matrices drawn from the PSB model, we provide a computationally efficient factorisation algorithm which recovers the generating factors with high probability in n from a near optimal number of vectors. We demonstrate experimentally the efficacy of this algorithm in practice and highlight potential applications: in particular, instead of requiring full access to the encoder matrix, our results show that in the MMV-CCS setting sparse recovery is possible using only knowledge of the distribution from which the encoder matrix is sampled.</p> <p>In Chapter 3 we continue our study of sparse representations, but consider them instead in the context of deep convolutional neural networks (DCNNs). In particular, we investigate the impact of ReLU and related sparsifying activation functions on signal propagation by studying the representations and activation patterns computed by the forward pass at each layer. To this end we consider a variant of the approach proposed by Papyan et al, which interprets the forward pass of a DCNN as solving a sequence of sparse coding problems - we hence refer to this approach as Deep Convolutional Sparse Coding (DCSC). The benefit of this approach is that by assuming a particular data model based on the notion of reversible networks, and by omitting the learning of parameters, one can more transparently analyze the role of the activation function and the efficacy of the forward pass of a DCNN in recovering activation pathways. In the prior work of Papyan et al the authors proved that representations with an activation density proportional to the ambient dimension of the data are recoverable. We extend these uniform guarantees and prove with high probability that representations with a far greater density of activations per layer are recoverable.</p> <p>Finally, in Chapter 4 we continue our investigation on the impact of the choice of activation function on signal propagation in deep, if not convolutional, neural networks. However, diverging markedly from Chapter 3 we forgo any assumptions on the data and instead consider random networks, which are of particular relevance for studying the properties of networks at initialisation. In this chapter we prove that certain problems at initialisation, which stop or hinder subsequent training, can be avoided by ensuring that the activation function deployed has a sufficiently large linear region around the origin. We further demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms of test and training accuracy as well as training time. In addition, these results also allow us to successfully train networks in new hyperparameter regimes, with much larger bias variances than have been used before. </p>
spellingShingle Neural networks (Computer science)
Compressed sensing (Telecommunication)
Matrix factorisation
Applied mathematics
Murray, M
From matrix factorisation to signal propagation in deep learning: algorithms and guarantees
title From matrix factorisation to signal propagation in deep learning: algorithms and guarantees
title_full From matrix factorisation to signal propagation in deep learning: algorithms and guarantees
title_fullStr From matrix factorisation to signal propagation in deep learning: algorithms and guarantees
title_full_unstemmed From matrix factorisation to signal propagation in deep learning: algorithms and guarantees
title_short From matrix factorisation to signal propagation in deep learning: algorithms and guarantees
title_sort from matrix factorisation to signal propagation in deep learning algorithms and guarantees
topic Neural networks (Computer science)
Compressed sensing (Telecommunication)
Matrix factorisation
Applied mathematics
work_keys_str_mv AT murraym frommatrixfactorisationtosignalpropagationindeeplearningalgorithmsandguarantees