Summary: | In the era of Artificial Intelligence (AI), the widespread adoption of AI applications has reached every facet of daily life, including the Internet of Things (IoT). However, the conventional Von Neumann architecture used in AI hardware implementations poses challenges, particularly for energy-efficient edge devices, where computational resources are constrained. Researchers are actively exploring solutions that leverage Computing-in-Memory (CIM) and Non-Volatile Memory (NVM) to optimize AI algorithms for these resource-limited environments. Deep neural networks (DNN) known for their remarkable performance in computer vision and natural language processing, demand substantial computational resources, making them unsuitable for edge devices. CIM and NVM technologies offer potential breakthroughs but face challenges like high static power and random telegraph noise. In conjunction with DNN, these innovations are poised to enable intelligent IoT applications, striking a balance between computational efficiency, memory constraints, and power consumption on edge devices. As another edge solution, FPGAs (Field-Programmable Gate Arrays) offer accelerated performance for deep neural networks by enabling parallel processing at the hardware level, resulting in faster inference and reduced latency compared to traditional CPU or GPU implementations. Additionally, FPGAs provide flexibility through reconfigurability, allowing optimization of hardware architectures tailored specifically to neural network models, enhancing efficiency and power consumption.
The challenges in adopting resistive nonvolatile memory (NVM) devices, specifically memristor-based crossbar arrays, for neural network applications are diverse. These hurdles encompass mitigating non-idealities introduced during memristor fabrication, including parasitic resistances, device uniformity variations, and finite resistance states, which can significantly distort output currents and undermine vector-matrix multiplication (VMM) accuracy. Furthermore, the pressing demand for energy-efficient AI algorithms in areas like ECG signal analysis and edge AI devices intensifies the need to overcome these challenges. Capacitive crossbar arrays offer potential energy efficiency advantages but grapple with issues such as precise capacitance tuning and voltage sensitivity, while RRAM-based neural networks hold promise for energy efficiency but confront readout errors, uniformity, and non-ideal factor challenges. Addressing these multifaceted challenges is essential to harness the full potential of NVM devices and meet the growing demand for efficient neural network computations in diverse applications. On another side, developing an edge solution leveraging FPGA technology through high-level synthesis for real-time data processing and optimized performance in resource-constrained environments is also in a pressing need.
This Ph.D. thesis comprehensively investigates the implementation of AI accelerating on edge devices. In the introduction, background and motivations of the research work are described. Then state-of-art CIM system, AI algorithms, FPGA implementation and applications are reviewed in Chapter 2. In Chapter 3, the hybrid of complementary NVM modeling is proposed as a reinforced approach, and it can lead to significant improvements in the hardware simulation speed and performance. In Chapter 4, a general software-hardware co-design deep learning framework implemented by NVM is employed to mitigate the impact of hardware non-idealities and reduce the inference accuracy degradations. Afterwards, in Chapter 5, a more sophisticated deep learning framework, customized PoolFormer, implemented by measured RRAM-based crossbar array is adopted to realize the implementation of Transformer-based DNN on hardware. The aim of Chapter 6 is to implement AI algorithms on hardware by CIM design for various applications. Next, Chapter 7 presents the advancement of high-level synthesis for FPGA in edge applications such as speech recognition and signal processing. Finally, Chapter 8 concludes the thesis and proposes the future work.
In conclusion, covering the modeling fundaments, algorithm developments, simulation, and application verifications, this Ph.D. thesis comprehensively investigates software-hardware co-design challenges, taking a large step towards practical edge AIoT applications.
|