Efficient Algorithms and Systems for Tiny Deep Learning

Tiny machine learning on IoT devices based on microcontroller units (MCUs) enables various real-world applications (e.g., keyword spotting, anomaly detection). However, deploying deep learning models to MCUs is challenging due to the limited memory size: the memory of microcontrollers is 2-3 orders...

Full description

Bibliographic Details
Main Author:	Lin, Ji
Other Authors:	Han, Song
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/139171

_version_	1826193922335965184
author	Lin, Ji
author2	Han, Song
author_facet	Han, Song Lin, Ji
author_sort	Lin, Ji
collection	MIT
description	Tiny machine learning on IoT devices based on microcontroller units (MCUs) enables various real-world applications (e.g., keyword spotting, anomaly detection). However, deploying deep learning models to MCUs is challenging due to the limited memory size: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. In this thesis, we study efficient algorithms and systems for tiny-scale deep learning. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. TinyNAS adopts a two-stage neural architecture search approach that first optimizes the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyNAS can automatically handle diverse constraints (i.e. device, latency, energy, memory) under low search costs. TinyNAS is co-designed with TinyEngine, a memory-efficient inference library to expand the search space and fit a larger model. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 3.4×, and accelerating the inference by 1.7-3.3× compared to TF-Lite Micro and CMSIS-NN. For vision applications on MCUs, we diagnosed and found that existing convolutional neural network (CNN) designs have an imbalanced peak memory distribution: the first several layers have much higher peak memory usage than the rest of the network. Based on the observation, we further extend the framework to support patch-based inference to break the memory bottleneck of the initial stage. MCUNet is the first to achieves >70% ImageNet top1 accuracy on an off-the-shelf commercial microcontroller, using 3.5× less SRAM and 5.7× less Flash compared to quantized MobileNetV2 and ResNet-18. On visual&audio wake words tasks, MCUNet achieves state-of-the-art accuracy and runs 2.4- 3.4× faster than MobileNetV2 and ProxylessNAS-based solutions with 3.7-4.1× smaller peak SRAM. Our study suggests that the era of always-on tiny machine learning on IoT devices has arrived.
first_indexed	2024-09-23T09:47:25Z
format	Thesis
id	mit-1721.1/139171
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T09:47:25Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1391712022-01-15T03:58:01Z Efficient Algorithms and Systems for Tiny Deep Learning Lin, Ji Han, Song Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Tiny machine learning on IoT devices based on microcontroller units (MCUs) enables various real-world applications (e.g., keyword spotting, anomaly detection). However, deploying deep learning models to MCUs is challenging due to the limited memory size: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. In this thesis, we study efficient algorithms and systems for tiny-scale deep learning. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. TinyNAS adopts a two-stage neural architecture search approach that first optimizes the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyNAS can automatically handle diverse constraints (i.e. device, latency, energy, memory) under low search costs. TinyNAS is co-designed with TinyEngine, a memory-efficient inference library to expand the search space and fit a larger model. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 3.4×, and accelerating the inference by 1.7-3.3× compared to TF-Lite Micro and CMSIS-NN. For vision applications on MCUs, we diagnosed and found that existing convolutional neural network (CNN) designs have an imbalanced peak memory distribution: the first several layers have much higher peak memory usage than the rest of the network. Based on the observation, we further extend the framework to support patch-based inference to break the memory bottleneck of the initial stage. MCUNet is the first to achieves >70% ImageNet top1 accuracy on an off-the-shelf commercial microcontroller, using 3.5× less SRAM and 5.7× less Flash compared to quantized MobileNetV2 and ResNet-18. On visual&audio wake words tasks, MCUNet achieves state-of-the-art accuracy and runs 2.4- 3.4× faster than MobileNetV2 and ProxylessNAS-based solutions with 3.7-4.1× smaller peak SRAM. Our study suggests that the era of always-on tiny machine learning on IoT devices has arrived. S.M. 2022-01-14T14:54:31Z 2022-01-14T14:54:31Z 2021-06 2021-06-24T19:24:57.305Z Thesis https://hdl.handle.net/1721.1/139171 0000-0001-6053-4344 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Lin, Ji Efficient Algorithms and Systems for Tiny Deep Learning
title	Efficient Algorithms and Systems for Tiny Deep Learning
title_full	Efficient Algorithms and Systems for Tiny Deep Learning
title_fullStr	Efficient Algorithms and Systems for Tiny Deep Learning
title_full_unstemmed	Efficient Algorithms and Systems for Tiny Deep Learning
title_short	Efficient Algorithms and Systems for Tiny Deep Learning
title_sort	efficient algorithms and systems for tiny deep learning
url	https://hdl.handle.net/1721.1/139171
work_keys_str_mv	AT linji efficientalgorithmsandsystemsfortinydeeplearning

Efficient Algorithms and Systems for Tiny Deep Learning

Similar Items