Optimisation for efficient deep learning

<p>Over the past 10 years there has been a huge advance in the performance power of deep neural networks on many supervised learning tasks. Over this period these models have redefined the state of the art numerous times on many classic machine vision and natural language processing benchmarks...

Full description

Bibliographic Details
Main Author: Paren, AJ
Other Authors: Poudel, R
Format: Thesis
Language:English
Published: 2022
Subjects:
_version_ 1826310346210541568
author Paren, AJ
author2 Poudel, R
author_facet Poudel, R
Paren, AJ
author_sort Paren, AJ
collection OXFORD
description <p>Over the past 10 years there has been a huge advance in the performance power of deep neural networks on many supervised learning tasks. Over this period these models have redefined the state of the art numerous times on many classic machine vision and natural language processing benchmarks. Deep neural networks have also found their way into many real-world applications including chat bots, art generation, voice activated virtual assistants, surveillance, and medical diagnosis systems. Much of the improved performance of these models can be attributed to an increase in scale, which in turn has raised computation and energy costs.</p> <p>In this thesis we detail approaches of how to reduce the cost of deploying deep neural networks in various settings. We first focus on training efficiency, and to that end we present two optimisation techniques that produce high accuracy models without extensive tuning. These optimisers only have a single fixed maximal step size hyperparameter to cross-validate and we demonstrate that they outperform other comparable methods in a wide range of settings. These approaches do not require the onerous process of finding a good learning rate schedule, which often requires training many versions of the same network, hence they reduce the computation needed. The first of these optimisers is a novel bundle method designed for the interpolation setting. The second demonstrates the effectiveness of a Polyak-like step size in combination with an online estimate of the optimal loss value in the non-interpolating setting.</p> <p>Next, we turn our attention to training efficient binary networks with both binary parameters and activations. With the right implementation, fully binary networks are highly efficient at inference time, as they can replace the majority of operations with cheaper bit-wise alternatives. This makes them well suited for lightweight or embedded applications. Due to the discrete nature of these models conventional training approaches are not viable. We present a simple and effective alternative to the existing optimisation techniques for these models.</p>
first_indexed 2024-03-07T07:50:35Z
format Thesis
id oxford-uuid:d1687acd-72b6-4c19-9dcb-ab521dbea830
institution University of Oxford
language English
last_indexed 2024-03-07T07:50:35Z
publishDate 2022
record_format dspace
spelling oxford-uuid:d1687acd-72b6-4c19-9dcb-ab521dbea8302023-07-10T13:03:31ZOptimisation for efficient deep learningThesishttp://purl.org/coar/resource_type/c_db06uuid:d1687acd-72b6-4c19-9dcb-ab521dbea830Machine learningEnglishHyrax Deposit2022Paren, AJPoudel, RKumar, MMudigonda, PZisserman, ADe, SHenriques, JBerrada, L<p>Over the past 10 years there has been a huge advance in the performance power of deep neural networks on many supervised learning tasks. Over this period these models have redefined the state of the art numerous times on many classic machine vision and natural language processing benchmarks. Deep neural networks have also found their way into many real-world applications including chat bots, art generation, voice activated virtual assistants, surveillance, and medical diagnosis systems. Much of the improved performance of these models can be attributed to an increase in scale, which in turn has raised computation and energy costs.</p> <p>In this thesis we detail approaches of how to reduce the cost of deploying deep neural networks in various settings. We first focus on training efficiency, and to that end we present two optimisation techniques that produce high accuracy models without extensive tuning. These optimisers only have a single fixed maximal step size hyperparameter to cross-validate and we demonstrate that they outperform other comparable methods in a wide range of settings. These approaches do not require the onerous process of finding a good learning rate schedule, which often requires training many versions of the same network, hence they reduce the computation needed. The first of these optimisers is a novel bundle method designed for the interpolation setting. The second demonstrates the effectiveness of a Polyak-like step size in combination with an online estimate of the optimal loss value in the non-interpolating setting.</p> <p>Next, we turn our attention to training efficient binary networks with both binary parameters and activations. With the right implementation, fully binary networks are highly efficient at inference time, as they can replace the majority of operations with cheaper bit-wise alternatives. This makes them well suited for lightweight or embedded applications. Due to the discrete nature of these models conventional training approaches are not viable. We present a simple and effective alternative to the existing optimisation techniques for these models.</p>
spellingShingle Machine learning
Paren, AJ
Optimisation for efficient deep learning
title Optimisation for efficient deep learning
title_full Optimisation for efficient deep learning
title_fullStr Optimisation for efficient deep learning
title_full_unstemmed Optimisation for efficient deep learning
title_short Optimisation for efficient deep learning
title_sort optimisation for efficient deep learning
topic Machine learning
work_keys_str_mv AT parenaj optimisationforefficientdeeplearning