An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs

Field-programmable gate arrays (FPGAs) can efficiently implement custom applications via their embedded digital signal processor (DSP) slices, including binary multipliers. An increasing number of binary multipliers belonging to a DSP slice usually demonstrate that it has the capacity to process as...

Full description

Bibliographic Details
Main Authors:	Zhangqin Huang, Shuo Zhang, Weidong Wang
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	DSP slice FPGAs multiplication performance optimization compute resource
Online Access:	https://ieeexplore.ieee.org/document/8772177/

_version_	1818566463981617152
author	Zhangqin Huang Shuo Zhang Weidong Wang
author_facet	Zhangqin Huang Shuo Zhang Weidong Wang
author_sort	Zhangqin Huang
collection	DOAJ
description	Field-programmable gate arrays (FPGAs) can efficiently implement custom applications via their embedded digital signal processor (DSP) slices, including binary multipliers. An increasing number of binary multipliers belonging to a DSP slice usually demonstrate that it has the capacity to process as many multiplication operations as possible in one clock cycle. In order to fully utilize the DSP resource, in this paper, we propose a novel DSP slice optimization method to achieve parallel multiplication on single DSP slice, namely PMSDS. First, the PMSDS splits multiplicators into two separate parts, i.e., valid bits and vacant bits, using a customized polynomial algebra method. Then, the PMSDS pre-calculates the maximum number of overflow bits combining the above-mentioned polynomial algebra method. Finally, it computes the total multiplicators' bit numbers and parallel the final multiplicators. We also propose an optimization model to find the best parallel solution according to the performance and precision of a single DSP slice. Moreover, we implement a PMSDS-based matrix multiplication algorithm supporting the computing precision dynamically changing. The experiments based on a large-scale and real-world matrix multiplication show that the PMSDS has better performance in latency and resource utilization than the traditional, add-tree, and full-unroll methods and is more outstanding in frequency and dynamic power consumption comparing with the state-of-the-art methods.
first_indexed	2024-12-14T01:54:00Z
format	Article
id	doaj.art-33af55334dff442aa3fce650666527c8
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-14T01:54:00Z
publishDate	2019-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-33af55334dff442aa3fce650666527c82022-12-21T23:21:16ZengIEEEIEEE Access2169-35362019-01-01710099310100810.1109/ACCESS.2019.29311618772177An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAsZhangqin Huang0Shuo Zhang1https://orcid.org/0000-0002-0892-8642Weidong Wang2https://orcid.org/0000-0002-7378-2766Beijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, Beijing, ChinaBeijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, Beijing, ChinaBeijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, Beijing, ChinaField-programmable gate arrays (FPGAs) can efficiently implement custom applications via their embedded digital signal processor (DSP) slices, including binary multipliers. An increasing number of binary multipliers belonging to a DSP slice usually demonstrate that it has the capacity to process as many multiplication operations as possible in one clock cycle. In order to fully utilize the DSP resource, in this paper, we propose a novel DSP slice optimization method to achieve parallel multiplication on single DSP slice, namely PMSDS. First, the PMSDS splits multiplicators into two separate parts, i.e., valid bits and vacant bits, using a customized polynomial algebra method. Then, the PMSDS pre-calculates the maximum number of overflow bits combining the above-mentioned polynomial algebra method. Finally, it computes the total multiplicators' bit numbers and parallel the final multiplicators. We also propose an optimization model to find the best parallel solution according to the performance and precision of a single DSP slice. Moreover, we implement a PMSDS-based matrix multiplication algorithm supporting the computing precision dynamically changing. The experiments based on a large-scale and real-world matrix multiplication show that the PMSDS has better performance in latency and resource utilization than the traditional, add-tree, and full-unroll methods and is more outstanding in frequency and dynamic power consumption comparing with the state-of-the-art methods.https://ieeexplore.ieee.org/document/8772177/DSP sliceFPGAsmultiplicationperformance optimizationcompute resource
spellingShingle	Zhangqin Huang Shuo Zhang Weidong Wang An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs IEEE Access DSP slice FPGAs multiplication performance optimization compute resource
title	An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
title_full	An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
title_fullStr	An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
title_full_unstemmed	An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
title_short	An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
title_sort	efficient method of parallel multiplication on a single dsp slice for embedded fpgas
topic	DSP slice FPGAs multiplication performance optimization compute resource
url	https://ieeexplore.ieee.org/document/8772177/
work_keys_str_mv	AT zhangqinhuang anefficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas AT shuozhang anefficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas AT weidongwang anefficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas AT zhangqinhuang efficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas AT shuozhang efficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas AT weidongwang efficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas

An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs

Similar Items