An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs

Field-programmable gate arrays (FPGAs) can efficiently implement custom applications via their embedded digital signal processor (DSP) slices, including binary multipliers. An increasing number of binary multipliers belonging to a DSP slice usually demonstrate that it has the capacity to process as...

Full description

Bibliographic Details
Main Authors: Zhangqin Huang, Shuo Zhang, Weidong Wang
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8772177/
_version_ 1818566463981617152
author Zhangqin Huang
Shuo Zhang
Weidong Wang
author_facet Zhangqin Huang
Shuo Zhang
Weidong Wang
author_sort Zhangqin Huang
collection DOAJ
description Field-programmable gate arrays (FPGAs) can efficiently implement custom applications via their embedded digital signal processor (DSP) slices, including binary multipliers. An increasing number of binary multipliers belonging to a DSP slice usually demonstrate that it has the capacity to process as many multiplication operations as possible in one clock cycle. In order to fully utilize the DSP resource, in this paper, we propose a novel DSP slice optimization method to achieve parallel multiplication on single DSP slice, namely PMSDS. First, the PMSDS splits multiplicators into two separate parts, i.e., valid bits and vacant bits, using a customized polynomial algebra method. Then, the PMSDS pre-calculates the maximum number of overflow bits combining the above-mentioned polynomial algebra method. Finally, it computes the total multiplicators' bit numbers and parallel the final multiplicators. We also propose an optimization model to find the best parallel solution according to the performance and precision of a single DSP slice. Moreover, we implement a PMSDS-based matrix multiplication algorithm supporting the computing precision dynamically changing. The experiments based on a large-scale and real-world matrix multiplication show that the PMSDS has better performance in latency and resource utilization than the traditional, add-tree, and full-unroll methods and is more outstanding in frequency and dynamic power consumption comparing with the state-of-the-art methods.
first_indexed 2024-12-14T01:54:00Z
format Article
id doaj.art-33af55334dff442aa3fce650666527c8
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-14T01:54:00Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-33af55334dff442aa3fce650666527c82022-12-21T23:21:16ZengIEEEIEEE Access2169-35362019-01-01710099310100810.1109/ACCESS.2019.29311618772177An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAsZhangqin Huang0Shuo Zhang1https://orcid.org/0000-0002-0892-8642Weidong Wang2https://orcid.org/0000-0002-7378-2766Beijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, Beijing, ChinaBeijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, Beijing, ChinaBeijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, Beijing, ChinaField-programmable gate arrays (FPGAs) can efficiently implement custom applications via their embedded digital signal processor (DSP) slices, including binary multipliers. An increasing number of binary multipliers belonging to a DSP slice usually demonstrate that it has the capacity to process as many multiplication operations as possible in one clock cycle. In order to fully utilize the DSP resource, in this paper, we propose a novel DSP slice optimization method to achieve parallel multiplication on single DSP slice, namely PMSDS. First, the PMSDS splits multiplicators into two separate parts, i.e., valid bits and vacant bits, using a customized polynomial algebra method. Then, the PMSDS pre-calculates the maximum number of overflow bits combining the above-mentioned polynomial algebra method. Finally, it computes the total multiplicators' bit numbers and parallel the final multiplicators. We also propose an optimization model to find the best parallel solution according to the performance and precision of a single DSP slice. Moreover, we implement a PMSDS-based matrix multiplication algorithm supporting the computing precision dynamically changing. The experiments based on a large-scale and real-world matrix multiplication show that the PMSDS has better performance in latency and resource utilization than the traditional, add-tree, and full-unroll methods and is more outstanding in frequency and dynamic power consumption comparing with the state-of-the-art methods.https://ieeexplore.ieee.org/document/8772177/DSP sliceFPGAsmultiplicationperformance optimizationcompute resource
spellingShingle Zhangqin Huang
Shuo Zhang
Weidong Wang
An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
IEEE Access
DSP slice
FPGAs
multiplication
performance optimization
compute resource
title An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
title_full An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
title_fullStr An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
title_full_unstemmed An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
title_short An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
title_sort efficient method of parallel multiplication on a single dsp slice for embedded fpgas
topic DSP slice
FPGAs
multiplication
performance optimization
compute resource
url https://ieeexplore.ieee.org/document/8772177/
work_keys_str_mv AT zhangqinhuang anefficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas
AT shuozhang anefficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas
AT weidongwang anefficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas
AT zhangqinhuang efficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas
AT shuozhang efficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas
AT weidongwang efficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas