Efficient hardware-accelerated pseudoinverse computation through algorithm restructuring for parallelization in high-level synthesis

This paper describes a fast and efficient hardware-accelerated pseudoinverse computation through algorithm restructuring and leveraging FPGA synthesis directives for parallelism prior to high-level synthesis (HLS). The algorithm, which is composed of modified Gram–Schmidt QR decomposition (MGS-QRD),...

Full description

Bibliographic Details
Main Authors: Tan, Chong Yeam, Ooi, Chia Yee, Choo, Hau Sim, Ismail, Nordinah
Format: Article
Published: John Wiley and Sons Ltd 2022
Subjects:
_version_ 1796866983862468608
author Tan, Chong Yeam
Ooi, Chia Yee
Choo, Hau Sim
Ismail, Nordinah
author_facet Tan, Chong Yeam
Ooi, Chia Yee
Choo, Hau Sim
Ismail, Nordinah
author_sort Tan, Chong Yeam
collection ePrints
description This paper describes a fast and efficient hardware-accelerated pseudoinverse computation through algorithm restructuring and leveraging FPGA synthesis directives for parallelism prior to high-level synthesis (HLS). The algorithm, which is composed of modified Gram–Schmidt QR decomposition (MGS-QRD), triangular matrix inversion (TMI), and matrix multiplication (MM), is synthesized and implemented on a field-programmable gate array (FPGA). MGS-QRD is restructured and augmented with parallelism directives prior to synthesizing the algorithm, which yielded an MGS-QRD hardware accelerator with high throughput. Modifications to the current TMI algorithm were also proposed, in which the removal of redundant computational tasks was done in order to speed up overall operation. Data dependencies in the MM algorithm were carefully considered such that appropriate parallelism directives were inserted, and matching the data flow of MM with MGS-QRD and TMI modules was also performed to accelerate the pseudoinverse computation. The results showed that the proposed pseudoinverse module is better than the naïve implementation which is composed of existing MGS-QRD, TMI and a standard MM in terms of maximum frequency (1.24× speedup), hardware resources(48% of reduction of DSP usage), latency (23% reduction), and throughput (62% increase).
first_indexed 2024-03-05T21:20:22Z
format Article
id utm.eprints-101017
institution Universiti Teknologi Malaysia - ePrints
last_indexed 2024-03-05T21:20:22Z
publishDate 2022
publisher John Wiley and Sons Ltd
record_format dspace
spelling utm.eprints-1010172023-05-23T10:37:43Z http://eprints.utm.my/101017/ Efficient hardware-accelerated pseudoinverse computation through algorithm restructuring for parallelization in high-level synthesis Tan, Chong Yeam Ooi, Chia Yee Choo, Hau Sim Ismail, Nordinah T Technology (General) This paper describes a fast and efficient hardware-accelerated pseudoinverse computation through algorithm restructuring and leveraging FPGA synthesis directives for parallelism prior to high-level synthesis (HLS). The algorithm, which is composed of modified Gram–Schmidt QR decomposition (MGS-QRD), triangular matrix inversion (TMI), and matrix multiplication (MM), is synthesized and implemented on a field-programmable gate array (FPGA). MGS-QRD is restructured and augmented with parallelism directives prior to synthesizing the algorithm, which yielded an MGS-QRD hardware accelerator with high throughput. Modifications to the current TMI algorithm were also proposed, in which the removal of redundant computational tasks was done in order to speed up overall operation. Data dependencies in the MM algorithm were carefully considered such that appropriate parallelism directives were inserted, and matching the data flow of MM with MGS-QRD and TMI modules was also performed to accelerate the pseudoinverse computation. The results showed that the proposed pseudoinverse module is better than the naïve implementation which is composed of existing MGS-QRD, TMI and a standard MM in terms of maximum frequency (1.24× speedup), hardware resources(48% of reduction of DSP usage), latency (23% reduction), and throughput (62% increase). John Wiley and Sons Ltd 2022-02 Article PeerReviewed Tan, Chong Yeam and Ooi, Chia Yee and Choo, Hau Sim and Ismail, Nordinah (2022) Efficient hardware-accelerated pseudoinverse computation through algorithm restructuring for parallelization in high-level synthesis. International Journal of Circuit Theory and Applications, 50 (2). pp. 394-416. ISSN 0098-9886 http://dx.doi.org/10.1002/cta.3155 DOI: 10.1002/cta.3155
spellingShingle T Technology (General)
Tan, Chong Yeam
Ooi, Chia Yee
Choo, Hau Sim
Ismail, Nordinah
Efficient hardware-accelerated pseudoinverse computation through algorithm restructuring for parallelization in high-level synthesis
title Efficient hardware-accelerated pseudoinverse computation through algorithm restructuring for parallelization in high-level synthesis
title_full Efficient hardware-accelerated pseudoinverse computation through algorithm restructuring for parallelization in high-level synthesis
title_fullStr Efficient hardware-accelerated pseudoinverse computation through algorithm restructuring for parallelization in high-level synthesis
title_full_unstemmed Efficient hardware-accelerated pseudoinverse computation through algorithm restructuring for parallelization in high-level synthesis
title_short Efficient hardware-accelerated pseudoinverse computation through algorithm restructuring for parallelization in high-level synthesis
title_sort efficient hardware accelerated pseudoinverse computation through algorithm restructuring for parallelization in high level synthesis
topic T Technology (General)
work_keys_str_mv AT tanchongyeam efficienthardwareacceleratedpseudoinversecomputationthroughalgorithmrestructuringforparallelizationinhighlevelsynthesis
AT ooichiayee efficienthardwareacceleratedpseudoinversecomputationthroughalgorithmrestructuringforparallelizationinhighlevelsynthesis
AT choohausim efficienthardwareacceleratedpseudoinversecomputationthroughalgorithmrestructuringforparallelizationinhighlevelsynthesis
AT ismailnordinah efficienthardwareacceleratedpseudoinversecomputationthroughalgorithmrestructuringforparallelizationinhighlevelsynthesis