Optimisation for efficient deep learning

<p>Over the past 10 years there has been a huge advance in the performance power of deep neural networks on many supervised learning tasks. Over this period these models have redefined the state of the art numerous times on many classic machine vision and natural language processing benchmarks...

Full description

Bibliographic Details
Main Author:	Paren, AJ
Other Authors:	Poudel, R
Format:	Thesis
Language:	English
Published:	2022
Subjects:	Machine learning

_version_	1826310346210541568
author	Paren, AJ
author2	Poudel, R
author_facet	Poudel, R Paren, AJ
author_sort	Paren, AJ
collection	OXFORD
description	<p>Over the past 10 years there has been a huge advance in the performance power of deep neural networks on many supervised learning tasks. Over this period these models have redefined the state of the art numerous times on many classic machine vision and natural language processing benchmarks. Deep neural networks have also found their way into many real-world applications including chat bots, art generation, voice activated virtual assistants, surveillance, and medical diagnosis systems. Much of the improved performance of these models can be attributed to an increase in scale, which in turn has raised computation and energy costs.</p> <p>In this thesis we detail approaches of how to reduce the cost of deploying deep neural networks in various settings. We first focus on training efficiency, and to that end we present two optimisation techniques that produce high accuracy models without extensive tuning. These optimisers only have a single fixed maximal step size hyperparameter to cross-validate and we demonstrate that they outperform other comparable methods in a wide range of settings. These approaches do not require the onerous process of finding a good learning rate schedule, which often requires training many versions of the same network, hence they reduce the computation needed. The first of these optimisers is a novel bundle method designed for the interpolation setting. The second demonstrates the effectiveness of a Polyak-like step size in combination with an online estimate of the optimal loss value in the non-interpolating setting.</p> <p>Next, we turn our attention to training efficient binary networks with both binary parameters and activations. With the right implementation, fully binary networks are highly efficient at inference time, as they can replace the majority of operations with cheaper bit-wise alternatives. This makes them well suited for lightweight or embedded applications. Due to the discrete nature of these models conventional training approaches are not viable. We present a simple and effective alternative to the existing optimisation techniques for these models.</p>
first_indexed	2024-03-07T07:50:35Z
format	Thesis
id	oxford-uuid:d1687acd-72b6-4c19-9dcb-ab521dbea830
institution	University of Oxford
language	English
last_indexed	2024-03-07T07:50:35Z
publishDate	2022
record_format	dspace
spelling	oxford-uuid:d1687acd-72b6-4c19-9dcb-ab521dbea8302023-07-10T13:03:31ZOptimisation for efficient deep learningThesishttp://purl.org/coar/resource_type/c_db06uuid:d1687acd-72b6-4c19-9dcb-ab521dbea830Machine learningEnglishHyrax Deposit2022Paren, AJPoudel, RKumar, MMudigonda, PZisserman, ADe, SHenriques, JBerrada, L<p>Over the past 10 years there has been a huge advance in the performance power of deep neural networks on many supervised learning tasks. Over this period these models have redefined the state of the art numerous times on many classic machine vision and natural language processing benchmarks. Deep neural networks have also found their way into many real-world applications including chat bots, art generation, voice activated virtual assistants, surveillance, and medical diagnosis systems. Much of the improved performance of these models can be attributed to an increase in scale, which in turn has raised computation and energy costs.</p> <p>In this thesis we detail approaches of how to reduce the cost of deploying deep neural networks in various settings. We first focus on training efficiency, and to that end we present two optimisation techniques that produce high accuracy models without extensive tuning. These optimisers only have a single fixed maximal step size hyperparameter to cross-validate and we demonstrate that they outperform other comparable methods in a wide range of settings. These approaches do not require the onerous process of finding a good learning rate schedule, which often requires training many versions of the same network, hence they reduce the computation needed. The first of these optimisers is a novel bundle method designed for the interpolation setting. The second demonstrates the effectiveness of a Polyak-like step size in combination with an online estimate of the optimal loss value in the non-interpolating setting.</p> <p>Next, we turn our attention to training efficient binary networks with both binary parameters and activations. With the right implementation, fully binary networks are highly efficient at inference time, as they can replace the majority of operations with cheaper bit-wise alternatives. This makes them well suited for lightweight or embedded applications. Due to the discrete nature of these models conventional training approaches are not viable. We present a simple and effective alternative to the existing optimisation techniques for these models.</p>
spellingShingle	Machine learning Paren, AJ Optimisation for efficient deep learning
title	Optimisation for efficient deep learning
title_full	Optimisation for efficient deep learning
title_fullStr	Optimisation for efficient deep learning
title_full_unstemmed	Optimisation for efficient deep learning
title_short	Optimisation for efficient deep learning
title_sort	optimisation for efficient deep learning
topic	Machine learning
work_keys_str_mv	AT parenaj optimisationforefficientdeeplearning

Optimisation for efficient deep learning

Similar Items