All Analog CNN Accelerator with RRAMs for Fast Inference

As AI applications become more prevalent and powerful, the performance of deep learning neural network is more demanding. The need to enable fast and energy efficient circuits for computing deep neural networks is urgent. Most current research works propose dedicated hardware for data to reuse thous...

Full description

Bibliographic Details
Main Author: Chao, Minghan
Other Authors: Shulaker, Max
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/146297
_version_ 1826210639322808320
author Chao, Minghan
author2 Shulaker, Max
author_facet Shulaker, Max
Chao, Minghan
author_sort Chao, Minghan
collection MIT
description As AI applications become more prevalent and powerful, the performance of deep learning neural network is more demanding. The need to enable fast and energy efficient circuits for computing deep neural networks is urgent. Most current research works propose dedicated hardware for data to reuse thousands of times. However, while re-using the same hardware to perform the same computation repeatedly saves area, it comes at the expense of execution time. This presents another critical obstacle, as the need for real-data and rapid AI requires a fundamentally faster approach to implementing neural networks. The focus of this thesis is to duplicate the key operation – multiply and accumulate (MAC) computation units, in the hardware so that there is no hardware re-use, enabling the entire neural network to be physically fabricated on a single chip. As neural networks today often require hundreds of thousands to tens of millions of MAC computation units, this requires designing the smallest MAC computation units to fit all of the operations on chip. Here, we present initial analysis on a convolutional neural network (CNN) accelerator that implements such a system, optimizing for inference speed. The accelerator duplicates all of the computation hardware, thus eliminating the need to fetch data back and forth while reusing the same hardware. We propose a novel design for memory cells using resistive random access memory (RRAM) and computation units utilizing the analog behavior of transistors. This circuit classifies one Cifar-10 dataset image in 6µs (160k frames/s) with 2.4µJ energy per classification with an accuracy of 85%. It contains 7.5 million MAC units and achieves 5 million MAC/mm².
first_indexed 2024-09-23T14:53:09Z
format Thesis
id mit-1721.1/146297
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T14:53:09Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1462972022-11-11T03:11:24Z All Analog CNN Accelerator with RRAMs for Fast Inference Chao, Minghan Shulaker, Max Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science As AI applications become more prevalent and powerful, the performance of deep learning neural network is more demanding. The need to enable fast and energy efficient circuits for computing deep neural networks is urgent. Most current research works propose dedicated hardware for data to reuse thousands of times. However, while re-using the same hardware to perform the same computation repeatedly saves area, it comes at the expense of execution time. This presents another critical obstacle, as the need for real-data and rapid AI requires a fundamentally faster approach to implementing neural networks. The focus of this thesis is to duplicate the key operation – multiply and accumulate (MAC) computation units, in the hardware so that there is no hardware re-use, enabling the entire neural network to be physically fabricated on a single chip. As neural networks today often require hundreds of thousands to tens of millions of MAC computation units, this requires designing the smallest MAC computation units to fit all of the operations on chip. Here, we present initial analysis on a convolutional neural network (CNN) accelerator that implements such a system, optimizing for inference speed. The accelerator duplicates all of the computation hardware, thus eliminating the need to fetch data back and forth while reusing the same hardware. We propose a novel design for memory cells using resistive random access memory (RRAM) and computation units utilizing the analog behavior of transistors. This circuit classifies one Cifar-10 dataset image in 6µs (160k frames/s) with 2.4µJ energy per classification with an accuracy of 85%. It contains 7.5 million MAC units and achieves 5 million MAC/mm². S.M. 2022-11-10T14:05:03Z 2022-11-10T14:05:03Z 2022-02 2022-03-04T20:59:41.259Z Thesis https://hdl.handle.net/1721.1/146297 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf application/pdf Massachusetts Institute of Technology
spellingShingle Chao, Minghan
All Analog CNN Accelerator with RRAMs for Fast Inference
title All Analog CNN Accelerator with RRAMs for Fast Inference
title_full All Analog CNN Accelerator with RRAMs for Fast Inference
title_fullStr All Analog CNN Accelerator with RRAMs for Fast Inference
title_full_unstemmed All Analog CNN Accelerator with RRAMs for Fast Inference
title_short All Analog CNN Accelerator with RRAMs for Fast Inference
title_sort all analog cnn accelerator with rrams for fast inference
url https://hdl.handle.net/1721.1/146297
work_keys_str_mv AT chaominghan allanalogcnnacceleratorwithrramsforfastinference