FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization

The Graph Attention Networks (GATs) exhibit outstanding performance in multiple authoritative node classification benchmark tests (including transductive and inductive). The purpose of this research is to implement an FPGA-based accelerator called FPGAN for graph attention networks that achieves sig...

Full description

Bibliographic Details
Main Authors: Weian Yan, Weiqin Tong, Xiaoli Zhi
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9195849/
_version_ 1818662421086076928
author Weian Yan
Weiqin Tong
Xiaoli Zhi
author_facet Weian Yan
Weiqin Tong
Xiaoli Zhi
author_sort Weian Yan
collection DOAJ
description The Graph Attention Networks (GATs) exhibit outstanding performance in multiple authoritative node classification benchmark tests (including transductive and inductive). The purpose of this research is to implement an FPGA-based accelerator called FPGAN for graph attention networks that achieves significant improvement on performance and energy efficiency without losing accuracy compared with PyTorch baseline. It eliminates the dependence on digital signal processors (DSPs) and large amounts of on-chip memory and can even work well on low-end FPGA devices. We design FPGAN with software and hardware co-optimization across the full stack from algorithm through architecture. Specifically, we compress model to reduce the model size, quantify features to perform fixed-point calculation, replace multiplication addition cell (MAC) with shift addition units (SAUs) to eliminate the dependence on DSPs, and design an efficient algorithm to approximate SoftMax function. We also adjust the activation functions and fuse operations to further reduce the computation requirement. Moreover, all data is vectorized and aligned for scalable vector computation and efficient memory access. All the above optimizations are integrated into a universal hardware pipeline for various structures of GATs. We evaluate our design on an Inspur F10A board with an Intel Arria 10 GX1150 and 16 GB DDR3 memory. Experimental results show that FPGAN can achieve 7.34 times speedup over Nvidia Tesla V100 and 593 times over Xeon CPU Gold 5115 while maintaining accuracy, and 48 times and 2400 times on energy efficiency respectively.
first_indexed 2024-12-17T05:00:41Z
format Article
id doaj.art-4419b29948614f4795d247af3830d366
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-17T05:00:41Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-4419b29948614f4795d247af3830d3662022-12-21T22:02:34ZengIEEEIEEE Access2169-35362020-01-01817160817162010.1109/ACCESS.2020.30239469195849FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-OptimizationWeian Yan0https://orcid.org/0000-0001-7249-6883Weiqin Tong1https://orcid.org/0000-0001-8300-6376Xiaoli Zhi2https://orcid.org/0000-0002-0615-2051School of Computer Engineering and Science, Shanghai University, Shanghai, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai, ChinaThe Graph Attention Networks (GATs) exhibit outstanding performance in multiple authoritative node classification benchmark tests (including transductive and inductive). The purpose of this research is to implement an FPGA-based accelerator called FPGAN for graph attention networks that achieves significant improvement on performance and energy efficiency without losing accuracy compared with PyTorch baseline. It eliminates the dependence on digital signal processors (DSPs) and large amounts of on-chip memory and can even work well on low-end FPGA devices. We design FPGAN with software and hardware co-optimization across the full stack from algorithm through architecture. Specifically, we compress model to reduce the model size, quantify features to perform fixed-point calculation, replace multiplication addition cell (MAC) with shift addition units (SAUs) to eliminate the dependence on DSPs, and design an efficient algorithm to approximate SoftMax function. We also adjust the activation functions and fuse operations to further reduce the computation requirement. Moreover, all data is vectorized and aligned for scalable vector computation and efficient memory access. All the above optimizations are integrated into a universal hardware pipeline for various structures of GATs. We evaluate our design on an Inspur F10A board with an Intel Arria 10 GX1150 and 16 GB DDR3 memory. Experimental results show that FPGAN can achieve 7.34 times speedup over Nvidia Tesla V100 and 593 times over Xeon CPU Gold 5115 while maintaining accuracy, and 48 times and 2400 times on energy efficiency respectively.https://ieeexplore.ieee.org/document/9195849/Graph attention networksmodel optimizationinference acceleratingfield programmable gate arrayheterogeneous computingparallel computing
spellingShingle Weian Yan
Weiqin Tong
Xiaoli Zhi
FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization
IEEE Access
Graph attention networks
model optimization
inference accelerating
field programmable gate array
heterogeneous computing
parallel computing
title FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization
title_full FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization
title_fullStr FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization
title_full_unstemmed FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization
title_short FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization
title_sort fpgan an fpga accelerator for graph attention networks with software and hardware co optimization
topic Graph attention networks
model optimization
inference accelerating
field programmable gate array
heterogeneous computing
parallel computing
url https://ieeexplore.ieee.org/document/9195849/
work_keys_str_mv AT weianyan fpgananfpgaacceleratorforgraphattentionnetworkswithsoftwareandhardwarecooptimization
AT weiqintong fpgananfpgaacceleratorforgraphattentionnetworkswithsoftwareandhardwarecooptimization
AT xiaolizhi fpgananfpgaacceleratorforgraphattentionnetworkswithsoftwareandhardwarecooptimization