Energy-efficient AI hardware design for edge intelligence

In the era of Artificial Intelligence (AI), the widespread adoption of AI applications has reached every facet of daily life, including the Internet of Things (IoT). However, the conventional Von Neumann architecture used in AI hardware implementations poses challenges, particularly for energy-effic...

Full description

Bibliographic Details
Main Author:	Cao, Tiancheng
Other Authors:	Goh Wang Ling
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Engineering
Online Access:	https://hdl.handle.net/10356/180196

_version_	1826123614542364672
author	Cao, Tiancheng
author2	Goh Wang Ling
author_facet	Goh Wang Ling Cao, Tiancheng
author_sort	Cao, Tiancheng
collection	NTU
description	In the era of Artificial Intelligence (AI), the widespread adoption of AI applications has reached every facet of daily life, including the Internet of Things (IoT). However, the conventional Von Neumann architecture used in AI hardware implementations poses challenges, particularly for energy-efficient edge devices, where computational resources are constrained. Researchers are actively exploring solutions that leverage Computing-in-Memory (CIM) and Non-Volatile Memory (NVM) to optimize AI algorithms for these resource-limited environments. Deep neural networks (DNN) known for their remarkable performance in computer vision and natural language processing, demand substantial computational resources, making them unsuitable for edge devices. CIM and NVM technologies offer potential breakthroughs but face challenges like high static power and random telegraph noise. In conjunction with DNN, these innovations are poised to enable intelligent IoT applications, striking a balance between computational efficiency, memory constraints, and power consumption on edge devices. As another edge solution, FPGAs (Field-Programmable Gate Arrays) offer accelerated performance for deep neural networks by enabling parallel processing at the hardware level, resulting in faster inference and reduced latency compared to traditional CPU or GPU implementations. Additionally, FPGAs provide flexibility through reconfigurability, allowing optimization of hardware architectures tailored specifically to neural network models, enhancing efficiency and power consumption. The challenges in adopting resistive nonvolatile memory (NVM) devices, specifically memristor-based crossbar arrays, for neural network applications are diverse. These hurdles encompass mitigating non-idealities introduced during memristor fabrication, including parasitic resistances, device uniformity variations, and finite resistance states, which can significantly distort output currents and undermine vector-matrix multiplication (VMM) accuracy. Furthermore, the pressing demand for energy-efficient AI algorithms in areas like ECG signal analysis and edge AI devices intensifies the need to overcome these challenges. Capacitive crossbar arrays offer potential energy efficiency advantages but grapple with issues such as precise capacitance tuning and voltage sensitivity, while RRAM-based neural networks hold promise for energy efficiency but confront readout errors, uniformity, and non-ideal factor challenges. Addressing these multifaceted challenges is essential to harness the full potential of NVM devices and meet the growing demand for efficient neural network computations in diverse applications. On another side, developing an edge solution leveraging FPGA technology through high-level synthesis for real-time data processing and optimized performance in resource-constrained environments is also in a pressing need. This Ph.D. thesis comprehensively investigates the implementation of AI accelerating on edge devices. In the introduction, background and motivations of the research work are described. Then state-of-art CIM system, AI algorithms, FPGA implementation and applications are reviewed in Chapter 2. In Chapter 3, the hybrid of complementary NVM modeling is proposed as a reinforced approach, and it can lead to significant improvements in the hardware simulation speed and performance. In Chapter 4, a general software-hardware co-design deep learning framework implemented by NVM is employed to mitigate the impact of hardware non-idealities and reduce the inference accuracy degradations. Afterwards, in Chapter 5, a more sophisticated deep learning framework, customized PoolFormer, implemented by measured RRAM-based crossbar array is adopted to realize the implementation of Transformer-based DNN on hardware. The aim of Chapter 6 is to implement AI algorithms on hardware by CIM design for various applications. Next, Chapter 7 presents the advancement of high-level synthesis for FPGA in edge applications such as speech recognition and signal processing. Finally, Chapter 8 concludes the thesis and proposes the future work. In conclusion, covering the modeling fundaments, algorithm developments, simulation, and application verifications, this Ph.D. thesis comprehensively investigates software-hardware co-design challenges, taking a large step towards practical edge AIoT applications.
first_indexed	2024-10-01T06:07:32Z
format	Thesis-Doctor of Philosophy
id	ntu-10356/180196
institution	Nanyang Technological University
language	English
last_indexed	2025-03-09T13:33:09Z
publishDate	2024
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1801962024-10-07T01:58:13Z Energy-efficient AI hardware design for edge intelligence Cao, Tiancheng Goh Wang Ling School of Electrical and Electronic Engineering EWLGOH@ntu.edu.sg Engineering In the era of Artificial Intelligence (AI), the widespread adoption of AI applications has reached every facet of daily life, including the Internet of Things (IoT). However, the conventional Von Neumann architecture used in AI hardware implementations poses challenges, particularly for energy-efficient edge devices, where computational resources are constrained. Researchers are actively exploring solutions that leverage Computing-in-Memory (CIM) and Non-Volatile Memory (NVM) to optimize AI algorithms for these resource-limited environments. Deep neural networks (DNN) known for their remarkable performance in computer vision and natural language processing, demand substantial computational resources, making them unsuitable for edge devices. CIM and NVM technologies offer potential breakthroughs but face challenges like high static power and random telegraph noise. In conjunction with DNN, these innovations are poised to enable intelligent IoT applications, striking a balance between computational efficiency, memory constraints, and power consumption on edge devices. As another edge solution, FPGAs (Field-Programmable Gate Arrays) offer accelerated performance for deep neural networks by enabling parallel processing at the hardware level, resulting in faster inference and reduced latency compared to traditional CPU or GPU implementations. Additionally, FPGAs provide flexibility through reconfigurability, allowing optimization of hardware architectures tailored specifically to neural network models, enhancing efficiency and power consumption. The challenges in adopting resistive nonvolatile memory (NVM) devices, specifically memristor-based crossbar arrays, for neural network applications are diverse. These hurdles encompass mitigating non-idealities introduced during memristor fabrication, including parasitic resistances, device uniformity variations, and finite resistance states, which can significantly distort output currents and undermine vector-matrix multiplication (VMM) accuracy. Furthermore, the pressing demand for energy-efficient AI algorithms in areas like ECG signal analysis and edge AI devices intensifies the need to overcome these challenges. Capacitive crossbar arrays offer potential energy efficiency advantages but grapple with issues such as precise capacitance tuning and voltage sensitivity, while RRAM-based neural networks hold promise for energy efficiency but confront readout errors, uniformity, and non-ideal factor challenges. Addressing these multifaceted challenges is essential to harness the full potential of NVM devices and meet the growing demand for efficient neural network computations in diverse applications. On another side, developing an edge solution leveraging FPGA technology through high-level synthesis for real-time data processing and optimized performance in resource-constrained environments is also in a pressing need. This Ph.D. thesis comprehensively investigates the implementation of AI accelerating on edge devices. In the introduction, background and motivations of the research work are described. Then state-of-art CIM system, AI algorithms, FPGA implementation and applications are reviewed in Chapter 2. In Chapter 3, the hybrid of complementary NVM modeling is proposed as a reinforced approach, and it can lead to significant improvements in the hardware simulation speed and performance. In Chapter 4, a general software-hardware co-design deep learning framework implemented by NVM is employed to mitigate the impact of hardware non-idealities and reduce the inference accuracy degradations. Afterwards, in Chapter 5, a more sophisticated deep learning framework, customized PoolFormer, implemented by measured RRAM-based crossbar array is adopted to realize the implementation of Transformer-based DNN on hardware. The aim of Chapter 6 is to implement AI algorithms on hardware by CIM design for various applications. Next, Chapter 7 presents the advancement of high-level synthesis for FPGA in edge applications such as speech recognition and signal processing. Finally, Chapter 8 concludes the thesis and proposes the future work. In conclusion, covering the modeling fundaments, algorithm developments, simulation, and application verifications, this Ph.D. thesis comprehensively investigates software-hardware co-design challenges, taking a large step towards practical edge AIoT applications. Doctor of Philosophy 2024-09-23T08:42:34Z 2024-09-23T08:42:34Z 2024 Thesis-Doctor of Philosophy Cao, T. (2024). Energy-efficient AI hardware design for edge intelligence. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/180196 https://hdl.handle.net/10356/180196 10.32657/10356/180196 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle	Engineering Cao, Tiancheng Energy-efficient AI hardware design for edge intelligence
title	Energy-efficient AI hardware design for edge intelligence
title_full	Energy-efficient AI hardware design for edge intelligence
title_fullStr	Energy-efficient AI hardware design for edge intelligence
title_full_unstemmed	Energy-efficient AI hardware design for edge intelligence
title_short	Energy-efficient AI hardware design for edge intelligence
title_sort	energy efficient ai hardware design for edge intelligence
topic	Engineering
url	https://hdl.handle.net/10356/180196
work_keys_str_mv	AT caotiancheng energyefficientaihardwaredesignforedgeintelligence

Energy-efficient AI hardware design for edge intelligence

Similar Items