Flexible Energy-Aware Image and Transformer Processors for Edge Computing

Machine learning inference on edge devices for image and language processing has become increasingly common in recent years, but faces challenges associated with high memory and computation requirements, coupled with limited energy resources. This work applies different quantization schemes and trai...

Full description

Bibliographic Details
Main Author:	Ji, Alex
Other Authors:	Chandrakasan, Anantha P.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/152854 https://orcid.org/0009-0000-7720-9951

_version_	1826206568227536896
author	Ji, Alex
author2	Chandrakasan, Anantha P.
author_facet	Chandrakasan, Anantha P. Ji, Alex
author_sort	Ji, Alex
collection	MIT
description	Machine learning inference on edge devices for image and language processing has become increasingly common in recent years, but faces challenges associated with high memory and computation requirements, coupled with limited energy resources. This work applies different quantization schemes and training techniques to reduce the cost of running these models and provide flexibility in the hardware. Energy scalability is achieved through bit width scaling, as well as model size scaling. These techniques are applied to three neural network accelerators, which have been taped out and tested, to enable efficient inference for a variety of applications. The first chip is a CNN accelerator that simplifies computation using nonlinearly quantized weights by reordering multiplication and accumulation. This modified computation requires additional storage elements compared to a conventional approach. To minimize the area overhead, a custom accumulator array layout is designed. The second chip targets moderately-sized Transformer models (e.g. ALBERT) using piecewise-linear quantization (PWLQ) for both weights and activations. Lastly, an energy-adaptive accelerator for natural language understanding based on lightweight Transformer models is presented. The model size can by adjusted by sampling the weights of the full model to obtain differently sized submodels, without the memory overhead of storing multiple models.
first_indexed	2024-09-23T13:35:01Z
format	Thesis
id	mit-1721.1/152854
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T13:35:01Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1528542023-11-03T03:29:24Z Flexible Energy-Aware Image and Transformer Processors for Edge Computing Ji, Alex Chandrakasan, Anantha P. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Machine learning inference on edge devices for image and language processing has become increasingly common in recent years, but faces challenges associated with high memory and computation requirements, coupled with limited energy resources. This work applies different quantization schemes and training techniques to reduce the cost of running these models and provide flexibility in the hardware. Energy scalability is achieved through bit width scaling, as well as model size scaling. These techniques are applied to three neural network accelerators, which have been taped out and tested, to enable efficient inference for a variety of applications. The first chip is a CNN accelerator that simplifies computation using nonlinearly quantized weights by reordering multiplication and accumulation. This modified computation requires additional storage elements compared to a conventional approach. To minimize the area overhead, a custom accumulator array layout is designed. The second chip targets moderately-sized Transformer models (e.g. ALBERT) using piecewise-linear quantization (PWLQ) for both weights and activations. Lastly, an energy-adaptive accelerator for natural language understanding based on lightweight Transformer models is presented. The model size can by adjusted by sampling the weights of the full model to obtain differently sized submodels, without the memory overhead of storing multiple models. Ph.D. 2023-11-02T20:22:24Z 2023-11-02T20:22:24Z 2023-09 2023-09-21T14:26:20.101Z Thesis https://hdl.handle.net/1721.1/152854 https://orcid.org/0009-0000-7720-9951 Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Ji, Alex Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title	Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title_full	Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title_fullStr	Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title_full_unstemmed	Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title_short	Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title_sort	flexible energy aware image and transformer processors for edge computing
url	https://hdl.handle.net/1721.1/152854 https://orcid.org/0009-0000-7720-9951
work_keys_str_mv	AT jialex flexibleenergyawareimageandtransformerprocessorsforedgecomputing

Flexible Energy-Aware Image and Transformer Processors for Edge Computing

Similar Items