Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic
Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency and throughput are limited by area/ene...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/151461 |
_version_ | 1826213829217878016 |
---|---|
author | Andrulis, Tanner |
author2 | Emer, Joel S. |
author_facet | Emer, Joel S. Andrulis, Tanner |
author_sort | Andrulis, Tanner |
collection | MIT |
description | Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency and throughput are limited by area/energy-intensive analog-to-digital converters (ADCs). Furthermore, existing accelerators that reduce ADC area/energy do so by changing DNN weights or by using low-resolution ADCs that reduce output fidelity. These approaches harm DNN accuracy and/or require costly DNN retraining to compensate.
To address these issues, this thesis explores tradeoffs around ADC area/energy and develops optimizations that can reduce ADC area/energy without retraining DNNs. We use these optimizations to develop a new PIM accelerator, RAELLA, which can adapt the architecture to each DNN. RAELLA lowers the resolution of computed analog values by encoding weights to produce near-zero analog values, adaptively slicing weights for each DNN layer, and dynamically slicing inputs through speculation and recovery. Low-resolution analog values allow RAELLA to both use efficient low-resolution ADCs and maintain accuracy without retraining, all while computing with fewer ADC converts.
Compared to other low-accuracy-loss PIM accelerators, RAELLA increases energy efficiency by up to 4.9x and throughput by up to 3.3x. Compared to PIM accelerators that cause accuracy loss and retrain DNNs to recover, RAELLA achieves similar efficiency and throughput without expensive DNN retraining. |
first_indexed | 2024-09-23T15:55:29Z |
format | Thesis |
id | mit-1721.1/151461 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T15:55:29Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1514612023-08-01T03:08:18Z Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic Andrulis, Tanner Emer, Joel S. Sze, Vivienne Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency and throughput are limited by area/energy-intensive analog-to-digital converters (ADCs). Furthermore, existing accelerators that reduce ADC area/energy do so by changing DNN weights or by using low-resolution ADCs that reduce output fidelity. These approaches harm DNN accuracy and/or require costly DNN retraining to compensate. To address these issues, this thesis explores tradeoffs around ADC area/energy and develops optimizations that can reduce ADC area/energy without retraining DNNs. We use these optimizations to develop a new PIM accelerator, RAELLA, which can adapt the architecture to each DNN. RAELLA lowers the resolution of computed analog values by encoding weights to produce near-zero analog values, adaptively slicing weights for each DNN layer, and dynamically slicing inputs through speculation and recovery. Low-resolution analog values allow RAELLA to both use efficient low-resolution ADCs and maintain accuracy without retraining, all while computing with fewer ADC converts. Compared to other low-accuracy-loss PIM accelerators, RAELLA increases energy efficiency by up to 4.9x and throughput by up to 3.3x. Compared to PIM accelerators that cause accuracy loss and retrain DNNs to recover, RAELLA achieves similar efficiency and throughput without expensive DNN retraining. S.M. 2023-07-31T19:41:27Z 2023-07-31T19:41:27Z 2023-06 2023-07-13T14:13:58.566Z Thesis https://hdl.handle.net/1721.1/151461 0000-0002-3168-9862 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Andrulis, Tanner Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic |
title | Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic |
title_full | Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic |
title_fullStr | Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic |
title_full_unstemmed | Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic |
title_short | Efficient, Accurate, and Flexible PIM Inference through Adaptable Low-Resolution Arithmetic |
title_sort | efficient accurate and flexible pim inference through adaptable low resolution arithmetic |
url | https://hdl.handle.net/1721.1/151461 |
work_keys_str_mv | AT andrulistanner efficientaccurateandflexiblepiminferencethroughadaptablelowresolutionarithmetic |