Running CNN efficiently on a FPGA

With increased demand for AI at the edge, there is a pressing need to adapt ever more computationally demanding deep learning models for deployment onto embedded devices. As accelerators for these networks, FPGAs have become preferred for their energy efficiency and adaptability, but models also...

Full description

Bibliographic Details
Main Author: Yang, Shenghao
Other Authors: Weichen Liu
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156579
Description
Summary:With increased demand for AI at the edge, there is a pressing need to adapt ever more computationally demanding deep learning models for deployment onto embedded devices. As accelerators for these networks, FPGAs have become preferred for their energy efficiency and adaptability, but models also need to be pre-processed before effective FPGA-based hardware accelerators can be designed. In this project, the author investigates the performance of Block-Balanced Sparsity, a model compression approach that prunes parameter matrices in deep learning networks via a structured manner that allows for efficient FPGA accelerator implementations. By testing this approach across different pruning strategies, the author found that the fine-tuning strategy led to the highest model accuracy, gradual pruning allowed for the fastest model development and learning rate rewinding provided the greatest ease-of-use.