Cloud/edge inference for deep neural networks

<p>In the past decade, deep learning has become one of the most dominant data analytic approaches due to its capability to achieve impressively high accuracies on a variety of key computing tasks. Some of these tasks need intensive computation, especially when image and video data captured fro...

Full description

Bibliographic Details
Main Author: Sbai, M
Other Authors: Trigoni, A
Format: Thesis
Language:English
Published: 2023
_version_ 1797111452509667328
author Sbai, M
author2 Trigoni, A
author_facet Trigoni, A
Sbai, M
author_sort Sbai, M
collection OXFORD
description <p>In the past decade, deep learning has become one of the most dominant data analytic approaches due to its capability to achieve impressively high accuracies on a variety of key computing tasks. Some of these tasks need intensive computation, especially when image and video data captured from cameras are input to the predictive model. In traditional systems like video surveillance, these tasks are offloaded to the cloud server, which automatically process the collected data by employing deep learning algorithms. This solution has limitations, most importantly high latency and network load, impacting the performance of real-time applications. The alternative solution, which consists in directly embedding deep neural networks (DNN) on the edge device (e.g. camera), is greatly limited by the amount of computation and storage resources. Since then, some research has examined approaches which distribute the inference of machine learning algorithms between the edge and the cloud. However, despite some promising results, several fundamental drawbacks still exist in terms of accuracy, efficiency, and applicability to resource-constrained environments.</p> <p>The work presented in this thesis tackles these shortcomings by proposing a novel partitioning approach for DNN inference between the edge and the cloud. This is the first work to consider simultaneous optimization of both the memory usage at the edge and the size of the data to be transferred over the wireless link. The experiments were performed on two different network architectures, MobileNetV1 and VGG16. The proposed approach makes it possible to execute part of the network on very constrained devices (e.g., microcontrollers), and under poor network conditions (e.g., LoRa) whilst retaining reasonable accuracies.</p> <p>The second contribution deals with DNN distribution on a cloud/multi-edge execution platform. By including early-exits and bottlenecks in the neural network, the solution we propose makes it possible to distribute computation according to time and accuracy requirements. In scenarios where the communication with the cloud is very limited (e.g. LoRa), it allows a 80% improvement in average latency compared to a traditional cloud solution.</p> <p>Finally, we deal with power-constrained devices. We propose an energy consumption estimation and optimization approach for DNNs with multiple early exits in a platform composed of several edge devices and a cloud server. The approach utilizes machine learning combined with dynamic linear programming techniques to estimate the energy consumption of different DNN configurations and identify the optimal configuration that minimizes energy consumption while maintaining the desired accuracy level.</p>
first_indexed 2024-03-07T08:10:34Z
format Thesis
id oxford-uuid:14ed4195-1bf6-448a-8cfd-59474dacb3cf
institution University of Oxford
language English
last_indexed 2024-03-07T08:10:34Z
publishDate 2023
record_format dspace
spelling oxford-uuid:14ed4195-1bf6-448a-8cfd-59474dacb3cf2023-11-27T10:15:10ZCloud/edge inference for deep neural networksThesishttp://purl.org/coar/resource_type/c_db06uuid:14ed4195-1bf6-448a-8cfd-59474dacb3cfEnglishHyrax Deposit2023Sbai, MTrigoni, AMarkham, A<p>In the past decade, deep learning has become one of the most dominant data analytic approaches due to its capability to achieve impressively high accuracies on a variety of key computing tasks. Some of these tasks need intensive computation, especially when image and video data captured from cameras are input to the predictive model. In traditional systems like video surveillance, these tasks are offloaded to the cloud server, which automatically process the collected data by employing deep learning algorithms. This solution has limitations, most importantly high latency and network load, impacting the performance of real-time applications. The alternative solution, which consists in directly embedding deep neural networks (DNN) on the edge device (e.g. camera), is greatly limited by the amount of computation and storage resources. Since then, some research has examined approaches which distribute the inference of machine learning algorithms between the edge and the cloud. However, despite some promising results, several fundamental drawbacks still exist in terms of accuracy, efficiency, and applicability to resource-constrained environments.</p> <p>The work presented in this thesis tackles these shortcomings by proposing a novel partitioning approach for DNN inference between the edge and the cloud. This is the first work to consider simultaneous optimization of both the memory usage at the edge and the size of the data to be transferred over the wireless link. The experiments were performed on two different network architectures, MobileNetV1 and VGG16. The proposed approach makes it possible to execute part of the network on very constrained devices (e.g., microcontrollers), and under poor network conditions (e.g., LoRa) whilst retaining reasonable accuracies.</p> <p>The second contribution deals with DNN distribution on a cloud/multi-edge execution platform. By including early-exits and bottlenecks in the neural network, the solution we propose makes it possible to distribute computation according to time and accuracy requirements. In scenarios where the communication with the cloud is very limited (e.g. LoRa), it allows a 80% improvement in average latency compared to a traditional cloud solution.</p> <p>Finally, we deal with power-constrained devices. We propose an energy consumption estimation and optimization approach for DNNs with multiple early exits in a platform composed of several edge devices and a cloud server. The approach utilizes machine learning combined with dynamic linear programming techniques to estimate the energy consumption of different DNN configurations and identify the optimal configuration that minimizes energy consumption while maintaining the desired accuracy level.</p>
spellingShingle Sbai, M
Cloud/edge inference for deep neural networks
title Cloud/edge inference for deep neural networks
title_full Cloud/edge inference for deep neural networks
title_fullStr Cloud/edge inference for deep neural networks
title_full_unstemmed Cloud/edge inference for deep neural networks
title_short Cloud/edge inference for deep neural networks
title_sort cloud edge inference for deep neural networks
work_keys_str_mv AT sbaim cloudedgeinferencefordeepneuralnetworks