Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure

Convolutional neural networks (CNNs) have significantly advanced various fields; however, their computational demands and power consumption have escalated, posing challenges for deployment in low-power scenarios. To address this issue and facilitate the application of CNNs in power constrained envir...

Full description

Bibliographic Details
Main Authors:	Yuhua Xu, Jie Luo, Wei Sun
Format:	Article
Language:	English
Published:	MDPI AG 2024-03-01
Series:	Sensors
Subjects:	FPGA accelerator convolutional neural networks full precision design space exploration dynamic partial reconfiguration
Online Access:	https://www.mdpi.com/1424-8220/24/7/2239

_version_	1827286514382405632
author	Yuhua Xu Jie Luo Wei Sun
author_facet	Yuhua Xu Jie Luo Wei Sun
author_sort	Yuhua Xu
collection	DOAJ
description	Convolutional neural networks (CNNs) have significantly advanced various fields; however, their computational demands and power consumption have escalated, posing challenges for deployment in low-power scenarios. To address this issue and facilitate the application of CNNs in power constrained environments, the development of dedicated CNN accelerators is crucial. Prior research has predominantly concentrated on developing low precision CNN accelerators using code generated from high-level synthesis (HLS) tools. Unfortunately, these approaches often fail to efficiently utilize the computational resources of field-programmable gate arrays (FPGAs) and do not extend well to full precision scenarios. To overcome these limitations, we integrate vector dot products to unify the convolution and fully connected layers. By treating the row vector of input feature maps as the fundamental processing unit, we balance processing latency and resource consumption while eliminating data rearrangement time. Furthermore, an accurate design space exploration (DSE) model is established to identify the optimal design points for each CNN layer, and dynamic partial reconfiguration is employed to maximize each layer’s access to computational resources. Our approach is validated through the implementation of AlexNet and VGG16 on 7A100T and ZU15EG platforms, respectively. We achieve an average convolutional layer throughput of 28.985 GOP/s and 246.711 GOP/s for full precision. Notably, the proposed accelerator demonstrates remarkable power efficiency, with a maximum improvement of 23.989 and 15.376 times compared to current state-of-the-art FPGA implementations.
first_indexed	2024-04-24T10:34:43Z
format	Article
id	doaj.art-859f6de3f2bb444f80daf4191b127def
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-04-24T10:34:43Z
publishDate	2024-03-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-859f6de3f2bb444f80daf4191b127def2024-04-12T13:26:34ZengMDPI AGSensors1424-82202024-03-01247223910.3390/s24072239Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable StructureYuhua Xu0Jie Luo1Wei Sun2School of Electronics and Information Technology (School of Microelectronics), Sun Yat-sen University, Guangzhou 510275, ChinaSchool of Electronics and Information Technology (School of Microelectronics), Sun Yat-sen University, Guangzhou 510275, ChinaSchool of Electronics and Information Technology (School of Microelectronics), Sun Yat-sen University, Guangzhou 510275, ChinaConvolutional neural networks (CNNs) have significantly advanced various fields; however, their computational demands and power consumption have escalated, posing challenges for deployment in low-power scenarios. To address this issue and facilitate the application of CNNs in power constrained environments, the development of dedicated CNN accelerators is crucial. Prior research has predominantly concentrated on developing low precision CNN accelerators using code generated from high-level synthesis (HLS) tools. Unfortunately, these approaches often fail to efficiently utilize the computational resources of field-programmable gate arrays (FPGAs) and do not extend well to full precision scenarios. To overcome these limitations, we integrate vector dot products to unify the convolution and fully connected layers. By treating the row vector of input feature maps as the fundamental processing unit, we balance processing latency and resource consumption while eliminating data rearrangement time. Furthermore, an accurate design space exploration (DSE) model is established to identify the optimal design points for each CNN layer, and dynamic partial reconfiguration is employed to maximize each layer’s access to computational resources. Our approach is validated through the implementation of AlexNet and VGG16 on 7A100T and ZU15EG platforms, respectively. We achieve an average convolutional layer throughput of 28.985 GOP/s and 246.711 GOP/s for full precision. Notably, the proposed accelerator demonstrates remarkable power efficiency, with a maximum improvement of 23.989 and 15.376 times compared to current state-of-the-art FPGA implementations.https://www.mdpi.com/1424-8220/24/7/2239FPGA acceleratorconvolutional neural networksfull precisiondesign space explorationdynamic partial reconfiguration
spellingShingle	Yuhua Xu Jie Luo Wei Sun Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure Sensors FPGA accelerator convolutional neural networks full precision design space exploration dynamic partial reconfiguration
title	Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure
title_full	Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure
title_fullStr	Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure
title_full_unstemmed	Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure
title_short	Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure
title_sort	flare an fpga based full precision low power cnn accelerator with reconfigurable structure
topic	FPGA accelerator convolutional neural networks full precision design space exploration dynamic partial reconfiguration
url	https://www.mdpi.com/1424-8220/24/7/2239
work_keys_str_mv	AT yuhuaxu flareanfpgabasedfullprecisionlowpowercnnacceleratorwithreconfigurablestructure AT jieluo flareanfpgabasedfullprecisionlowpowercnnacceleratorwithreconfigurablestructure AT weisun flareanfpgabasedfullprecisionlowpowercnnacceleratorwithreconfigurablestructure

Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure

Similar Items