Flexible Energy-Aware Image and Transformer Processors for Edge Computing

Machine learning inference on edge devices for image and language processing has become increasingly common in recent years, but faces challenges associated with high memory and computation requirements, coupled with limited energy resources. This work applies different quantization schemes and trai...

Full description

Bibliographic Details
Main Author: Ji, Alex
Other Authors: Chandrakasan, Anantha P.
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/152854
https://orcid.org/0009-0000-7720-9951
_version_ 1826206568227536896
author Ji, Alex
author2 Chandrakasan, Anantha P.
author_facet Chandrakasan, Anantha P.
Ji, Alex
author_sort Ji, Alex
collection MIT
description Machine learning inference on edge devices for image and language processing has become increasingly common in recent years, but faces challenges associated with high memory and computation requirements, coupled with limited energy resources. This work applies different quantization schemes and training techniques to reduce the cost of running these models and provide flexibility in the hardware. Energy scalability is achieved through bit width scaling, as well as model size scaling. These techniques are applied to three neural network accelerators, which have been taped out and tested, to enable efficient inference for a variety of applications. The first chip is a CNN accelerator that simplifies computation using nonlinearly quantized weights by reordering multiplication and accumulation. This modified computation requires additional storage elements compared to a conventional approach. To minimize the area overhead, a custom accumulator array layout is designed. The second chip targets moderately-sized Transformer models (e.g. ALBERT) using piecewise-linear quantization (PWLQ) for both weights and activations. Lastly, an energy-adaptive accelerator for natural language understanding based on lightweight Transformer models is presented. The model size can by adjusted by sampling the weights of the full model to obtain differently sized submodels, without the memory overhead of storing multiple models.
first_indexed 2024-09-23T13:35:01Z
format Thesis
id mit-1721.1/152854
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T13:35:01Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1528542023-11-03T03:29:24Z Flexible Energy-Aware Image and Transformer Processors for Edge Computing Ji, Alex Chandrakasan, Anantha P. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Machine learning inference on edge devices for image and language processing has become increasingly common in recent years, but faces challenges associated with high memory and computation requirements, coupled with limited energy resources. This work applies different quantization schemes and training techniques to reduce the cost of running these models and provide flexibility in the hardware. Energy scalability is achieved through bit width scaling, as well as model size scaling. These techniques are applied to three neural network accelerators, which have been taped out and tested, to enable efficient inference for a variety of applications. The first chip is a CNN accelerator that simplifies computation using nonlinearly quantized weights by reordering multiplication and accumulation. This modified computation requires additional storage elements compared to a conventional approach. To minimize the area overhead, a custom accumulator array layout is designed. The second chip targets moderately-sized Transformer models (e.g. ALBERT) using piecewise-linear quantization (PWLQ) for both weights and activations. Lastly, an energy-adaptive accelerator for natural language understanding based on lightweight Transformer models is presented. The model size can by adjusted by sampling the weights of the full model to obtain differently sized submodels, without the memory overhead of storing multiple models. Ph.D. 2023-11-02T20:22:24Z 2023-11-02T20:22:24Z 2023-09 2023-09-21T14:26:20.101Z Thesis https://hdl.handle.net/1721.1/152854 https://orcid.org/0009-0000-7720-9951 Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Ji, Alex
Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title_full Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title_fullStr Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title_full_unstemmed Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title_short Flexible Energy-Aware Image and Transformer Processors for Edge Computing
title_sort flexible energy aware image and transformer processors for edge computing
url https://hdl.handle.net/1721.1/152854
https://orcid.org/0009-0000-7720-9951
work_keys_str_mv AT jialex flexibleenergyawareimageandtransformerprocessorsforedgecomputing