Flexible Energy-Aware Image and Transformer Processors for Edge Computing
Machine learning inference on edge devices for image and language processing has become increasingly common in recent years, but faces challenges associated with high memory and computation requirements, coupled with limited energy resources. This work applies different quantization schemes and trai...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/152854 https://orcid.org/0009-0000-7720-9951 |
_version_ | 1826206568227536896 |
---|---|
author | Ji, Alex |
author2 | Chandrakasan, Anantha P. |
author_facet | Chandrakasan, Anantha P. Ji, Alex |
author_sort | Ji, Alex |
collection | MIT |
description | Machine learning inference on edge devices for image and language processing has become increasingly common in recent years, but faces challenges associated with high memory and computation requirements, coupled with limited energy resources. This work applies different quantization schemes and training techniques to reduce the cost of running these models and provide flexibility in the hardware. Energy scalability is achieved through bit width scaling, as well as model size scaling. These techniques are applied to three neural network accelerators, which have been taped out and tested, to enable efficient inference for a variety of applications.
The first chip is a CNN accelerator that simplifies computation using nonlinearly quantized weights by reordering multiplication and accumulation. This modified computation requires additional storage elements compared to a conventional approach. To minimize the area overhead, a custom accumulator array layout is designed. The second chip targets moderately-sized Transformer models (e.g. ALBERT) using piecewise-linear quantization (PWLQ) for both weights and activations. Lastly, an energy-adaptive accelerator for natural language understanding based on lightweight Transformer models is presented. The model size can by adjusted by sampling the weights of the full model to obtain differently sized submodels, without the memory overhead of storing multiple models. |
first_indexed | 2024-09-23T13:35:01Z |
format | Thesis |
id | mit-1721.1/152854 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T13:35:01Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1528542023-11-03T03:29:24Z Flexible Energy-Aware Image and Transformer Processors for Edge Computing Ji, Alex Chandrakasan, Anantha P. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Machine learning inference on edge devices for image and language processing has become increasingly common in recent years, but faces challenges associated with high memory and computation requirements, coupled with limited energy resources. This work applies different quantization schemes and training techniques to reduce the cost of running these models and provide flexibility in the hardware. Energy scalability is achieved through bit width scaling, as well as model size scaling. These techniques are applied to three neural network accelerators, which have been taped out and tested, to enable efficient inference for a variety of applications. The first chip is a CNN accelerator that simplifies computation using nonlinearly quantized weights by reordering multiplication and accumulation. This modified computation requires additional storage elements compared to a conventional approach. To minimize the area overhead, a custom accumulator array layout is designed. The second chip targets moderately-sized Transformer models (e.g. ALBERT) using piecewise-linear quantization (PWLQ) for both weights and activations. Lastly, an energy-adaptive accelerator for natural language understanding based on lightweight Transformer models is presented. The model size can by adjusted by sampling the weights of the full model to obtain differently sized submodels, without the memory overhead of storing multiple models. Ph.D. 2023-11-02T20:22:24Z 2023-11-02T20:22:24Z 2023-09 2023-09-21T14:26:20.101Z Thesis https://hdl.handle.net/1721.1/152854 https://orcid.org/0009-0000-7720-9951 Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Ji, Alex Flexible Energy-Aware Image and Transformer Processors for Edge Computing |
title | Flexible Energy-Aware Image and Transformer Processors for Edge Computing |
title_full | Flexible Energy-Aware Image and Transformer Processors for Edge Computing |
title_fullStr | Flexible Energy-Aware Image and Transformer Processors for Edge Computing |
title_full_unstemmed | Flexible Energy-Aware Image and Transformer Processors for Edge Computing |
title_short | Flexible Energy-Aware Image and Transformer Processors for Edge Computing |
title_sort | flexible energy aware image and transformer processors for edge computing |
url | https://hdl.handle.net/1721.1/152854 https://orcid.org/0009-0000-7720-9951 |
work_keys_str_mv | AT jialex flexibleenergyawareimageandtransformerprocessorsforedgecomputing |