Cloud/edge inference for deep neural networks

<p>In the past decade, deep learning has become one of the most dominant data analytic approaches due to its capability to achieve impressively high accuracies on a variety of key computing tasks. Some of these tasks need intensive computation, especially when image and video data captured fro...

Full description

Bibliographic Details
Main Author:	Sbai, M
Other Authors:	Trigoni, A
Format:	Thesis
Language:	English
Published:	2023

_version_	1797111452509667328
author	Sbai, M
author2	Trigoni, A
author_facet	Trigoni, A Sbai, M
author_sort	Sbai, M
collection	OXFORD
description	<p>In the past decade, deep learning has become one of the most dominant data analytic approaches due to its capability to achieve impressively high accuracies on a variety of key computing tasks. Some of these tasks need intensive computation, especially when image and video data captured from cameras are input to the predictive model. In traditional systems like video surveillance, these tasks are offloaded to the cloud server, which automatically process the collected data by employing deep learning algorithms. This solution has limitations, most importantly high latency and network load, impacting the performance of real-time applications. The alternative solution, which consists in directly embedding deep neural networks (DNN) on the edge device (e.g. camera), is greatly limited by the amount of computation and storage resources. Since then, some research has examined approaches which distribute the inference of machine learning algorithms between the edge and the cloud. However, despite some promising results, several fundamental drawbacks still exist in terms of accuracy, efficiency, and applicability to resource-constrained environments.</p> <p>The work presented in this thesis tackles these shortcomings by proposing a novel partitioning approach for DNN inference between the edge and the cloud. This is the first work to consider simultaneous optimization of both the memory usage at the edge and the size of the data to be transferred over the wireless link. The experiments were performed on two different network architectures, MobileNetV1 and VGG16. The proposed approach makes it possible to execute part of the network on very constrained devices (e.g., microcontrollers), and under poor network conditions (e.g., LoRa) whilst retaining reasonable accuracies.</p> <p>The second contribution deals with DNN distribution on a cloud/multi-edge execution platform. By including early-exits and bottlenecks in the neural network, the solution we propose makes it possible to distribute computation according to time and accuracy requirements. In scenarios where the communication with the cloud is very limited (e.g. LoRa), it allows a 80% improvement in average latency compared to a traditional cloud solution.</p> <p>Finally, we deal with power-constrained devices. We propose an energy consumption estimation and optimization approach for DNNs with multiple early exits in a platform composed of several edge devices and a cloud server. The approach utilizes machine learning combined with dynamic linear programming techniques to estimate the energy consumption of different DNN configurations and identify the optimal configuration that minimizes energy consumption while maintaining the desired accuracy level.</p>
first_indexed	2024-03-07T08:10:34Z
format	Thesis
id	oxford-uuid:14ed4195-1bf6-448a-8cfd-59474dacb3cf
institution	University of Oxford
language	English
last_indexed	2024-03-07T08:10:34Z
publishDate	2023
record_format	dspace
spelling	oxford-uuid:14ed4195-1bf6-448a-8cfd-59474dacb3cf2023-11-27T10:15:10ZCloud/edge inference for deep neural networksThesishttp://purl.org/coar/resource_type/c_db06uuid:14ed4195-1bf6-448a-8cfd-59474dacb3cfEnglishHyrax Deposit2023Sbai, MTrigoni, AMarkham, A<p>In the past decade, deep learning has become one of the most dominant data analytic approaches due to its capability to achieve impressively high accuracies on a variety of key computing tasks. Some of these tasks need intensive computation, especially when image and video data captured from cameras are input to the predictive model. In traditional systems like video surveillance, these tasks are offloaded to the cloud server, which automatically process the collected data by employing deep learning algorithms. This solution has limitations, most importantly high latency and network load, impacting the performance of real-time applications. The alternative solution, which consists in directly embedding deep neural networks (DNN) on the edge device (e.g. camera), is greatly limited by the amount of computation and storage resources. Since then, some research has examined approaches which distribute the inference of machine learning algorithms between the edge and the cloud. However, despite some promising results, several fundamental drawbacks still exist in terms of accuracy, efficiency, and applicability to resource-constrained environments.</p> <p>The work presented in this thesis tackles these shortcomings by proposing a novel partitioning approach for DNN inference between the edge and the cloud. This is the first work to consider simultaneous optimization of both the memory usage at the edge and the size of the data to be transferred over the wireless link. The experiments were performed on two different network architectures, MobileNetV1 and VGG16. The proposed approach makes it possible to execute part of the network on very constrained devices (e.g., microcontrollers), and under poor network conditions (e.g., LoRa) whilst retaining reasonable accuracies.</p> <p>The second contribution deals with DNN distribution on a cloud/multi-edge execution platform. By including early-exits and bottlenecks in the neural network, the solution we propose makes it possible to distribute computation according to time and accuracy requirements. In scenarios where the communication with the cloud is very limited (e.g. LoRa), it allows a 80% improvement in average latency compared to a traditional cloud solution.</p> <p>Finally, we deal with power-constrained devices. We propose an energy consumption estimation and optimization approach for DNNs with multiple early exits in a platform composed of several edge devices and a cloud server. The approach utilizes machine learning combined with dynamic linear programming techniques to estimate the energy consumption of different DNN configurations and identify the optimal configuration that minimizes energy consumption while maintaining the desired accuracy level.</p>
spellingShingle	Sbai, M Cloud/edge inference for deep neural networks
title	Cloud/edge inference for deep neural networks
title_full	Cloud/edge inference for deep neural networks
title_fullStr	Cloud/edge inference for deep neural networks
title_full_unstemmed	Cloud/edge inference for deep neural networks
title_short	Cloud/edge inference for deep neural networks
title_sort	cloud edge inference for deep neural networks
work_keys_str_mv	AT sbaim cloudedgeinferencefordeepneuralnetworks

Cloud/edge inference for deep neural networks

Similar Items