Optimisation for efficient deep learning
<p>Over the past 10 years there has been a huge advance in the performance power of deep neural networks on many supervised learning tasks. Over this period these models have redefined the state of the art numerous times on many classic machine vision and natural language processing benchmarks...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2022
|
Subjects: |
_version_ | 1826310346210541568 |
---|---|
author | Paren, AJ |
author2 | Poudel, R |
author_facet | Poudel, R Paren, AJ |
author_sort | Paren, AJ |
collection | OXFORD |
description | <p>Over the past 10 years there has been a huge advance in the performance power of deep neural networks on many supervised learning tasks. Over this period these models have redefined the state of the art numerous times on many classic machine vision and natural language processing benchmarks. Deep neural networks have also found their way into many real-world applications including chat bots, art generation, voice activated virtual assistants, surveillance, and medical diagnosis systems. Much of the improved performance of these models can be attributed to an increase in scale, which in turn has raised computation and energy costs.</p>
<p>In this thesis we detail approaches of how to reduce the cost of deploying deep neural networks in various settings. We first focus on training efficiency, and to that end we present two optimisation techniques that produce high accuracy models without extensive tuning. These optimisers only have a single fixed maximal step size hyperparameter to cross-validate and we demonstrate that they outperform other comparable methods in a wide range of settings. These approaches do not require the onerous process of finding a good learning rate schedule, which often requires training many versions of the same network, hence they reduce the computation needed. The first of these optimisers is a novel bundle method designed for the interpolation setting. The second demonstrates the effectiveness of a Polyak-like step size in combination with an online estimate of the optimal loss value in the non-interpolating setting.</p>
<p>Next, we turn our attention to training efficient binary networks with both binary parameters and activations. With the right implementation, fully binary networks are highly efficient at inference time, as they can replace the majority of operations with cheaper bit-wise alternatives. This makes them well suited for lightweight or embedded applications. Due to the discrete nature of these models conventional training approaches are not viable. We present a simple and effective alternative to the existing optimisation techniques for these models.</p> |
first_indexed | 2024-03-07T07:50:35Z |
format | Thesis |
id | oxford-uuid:d1687acd-72b6-4c19-9dcb-ab521dbea830 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:50:35Z |
publishDate | 2022 |
record_format | dspace |
spelling | oxford-uuid:d1687acd-72b6-4c19-9dcb-ab521dbea8302023-07-10T13:03:31ZOptimisation for efficient deep learningThesishttp://purl.org/coar/resource_type/c_db06uuid:d1687acd-72b6-4c19-9dcb-ab521dbea830Machine learningEnglishHyrax Deposit2022Paren, AJPoudel, RKumar, MMudigonda, PZisserman, ADe, SHenriques, JBerrada, L<p>Over the past 10 years there has been a huge advance in the performance power of deep neural networks on many supervised learning tasks. Over this period these models have redefined the state of the art numerous times on many classic machine vision and natural language processing benchmarks. Deep neural networks have also found their way into many real-world applications including chat bots, art generation, voice activated virtual assistants, surveillance, and medical diagnosis systems. Much of the improved performance of these models can be attributed to an increase in scale, which in turn has raised computation and energy costs.</p> <p>In this thesis we detail approaches of how to reduce the cost of deploying deep neural networks in various settings. We first focus on training efficiency, and to that end we present two optimisation techniques that produce high accuracy models without extensive tuning. These optimisers only have a single fixed maximal step size hyperparameter to cross-validate and we demonstrate that they outperform other comparable methods in a wide range of settings. These approaches do not require the onerous process of finding a good learning rate schedule, which often requires training many versions of the same network, hence they reduce the computation needed. The first of these optimisers is a novel bundle method designed for the interpolation setting. The second demonstrates the effectiveness of a Polyak-like step size in combination with an online estimate of the optimal loss value in the non-interpolating setting.</p> <p>Next, we turn our attention to training efficient binary networks with both binary parameters and activations. With the right implementation, fully binary networks are highly efficient at inference time, as they can replace the majority of operations with cheaper bit-wise alternatives. This makes them well suited for lightweight or embedded applications. Due to the discrete nature of these models conventional training approaches are not viable. We present a simple and effective alternative to the existing optimisation techniques for these models.</p> |
spellingShingle | Machine learning Paren, AJ Optimisation for efficient deep learning |
title | Optimisation for efficient deep learning |
title_full | Optimisation for efficient deep learning |
title_fullStr | Optimisation for efficient deep learning |
title_full_unstemmed | Optimisation for efficient deep learning |
title_short | Optimisation for efficient deep learning |
title_sort | optimisation for efficient deep learning |
topic | Machine learning |
work_keys_str_mv | AT parenaj optimisationforefficientdeeplearning |