An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs
Field-programmable gate arrays (FPGAs) can efficiently implement custom applications via their embedded digital signal processor (DSP) slices, including binary multipliers. An increasing number of binary multipliers belonging to a DSP slice usually demonstrate that it has the capacity to process as...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8772177/ |
_version_ | 1818566463981617152 |
---|---|
author | Zhangqin Huang Shuo Zhang Weidong Wang |
author_facet | Zhangqin Huang Shuo Zhang Weidong Wang |
author_sort | Zhangqin Huang |
collection | DOAJ |
description | Field-programmable gate arrays (FPGAs) can efficiently implement custom applications via their embedded digital signal processor (DSP) slices, including binary multipliers. An increasing number of binary multipliers belonging to a DSP slice usually demonstrate that it has the capacity to process as many multiplication operations as possible in one clock cycle. In order to fully utilize the DSP resource, in this paper, we propose a novel DSP slice optimization method to achieve parallel multiplication on single DSP slice, namely PMSDS. First, the PMSDS splits multiplicators into two separate parts, i.e., valid bits and vacant bits, using a customized polynomial algebra method. Then, the PMSDS pre-calculates the maximum number of overflow bits combining the above-mentioned polynomial algebra method. Finally, it computes the total multiplicators' bit numbers and parallel the final multiplicators. We also propose an optimization model to find the best parallel solution according to the performance and precision of a single DSP slice. Moreover, we implement a PMSDS-based matrix multiplication algorithm supporting the computing precision dynamically changing. The experiments based on a large-scale and real-world matrix multiplication show that the PMSDS has better performance in latency and resource utilization than the traditional, add-tree, and full-unroll methods and is more outstanding in frequency and dynamic power consumption comparing with the state-of-the-art methods. |
first_indexed | 2024-12-14T01:54:00Z |
format | Article |
id | doaj.art-33af55334dff442aa3fce650666527c8 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-14T01:54:00Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-33af55334dff442aa3fce650666527c82022-12-21T23:21:16ZengIEEEIEEE Access2169-35362019-01-01710099310100810.1109/ACCESS.2019.29311618772177An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAsZhangqin Huang0Shuo Zhang1https://orcid.org/0000-0002-0892-8642Weidong Wang2https://orcid.org/0000-0002-7378-2766Beijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, Beijing, ChinaBeijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, Beijing, ChinaBeijing Engineering Research Center for IoT Software and Systems, Beijing University of Technology, Beijing, ChinaField-programmable gate arrays (FPGAs) can efficiently implement custom applications via their embedded digital signal processor (DSP) slices, including binary multipliers. An increasing number of binary multipliers belonging to a DSP slice usually demonstrate that it has the capacity to process as many multiplication operations as possible in one clock cycle. In order to fully utilize the DSP resource, in this paper, we propose a novel DSP slice optimization method to achieve parallel multiplication on single DSP slice, namely PMSDS. First, the PMSDS splits multiplicators into two separate parts, i.e., valid bits and vacant bits, using a customized polynomial algebra method. Then, the PMSDS pre-calculates the maximum number of overflow bits combining the above-mentioned polynomial algebra method. Finally, it computes the total multiplicators' bit numbers and parallel the final multiplicators. We also propose an optimization model to find the best parallel solution according to the performance and precision of a single DSP slice. Moreover, we implement a PMSDS-based matrix multiplication algorithm supporting the computing precision dynamically changing. The experiments based on a large-scale and real-world matrix multiplication show that the PMSDS has better performance in latency and resource utilization than the traditional, add-tree, and full-unroll methods and is more outstanding in frequency and dynamic power consumption comparing with the state-of-the-art methods.https://ieeexplore.ieee.org/document/8772177/DSP sliceFPGAsmultiplicationperformance optimizationcompute resource |
spellingShingle | Zhangqin Huang Shuo Zhang Weidong Wang An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs IEEE Access DSP slice FPGAs multiplication performance optimization compute resource |
title | An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs |
title_full | An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs |
title_fullStr | An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs |
title_full_unstemmed | An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs |
title_short | An Efficient Method of Parallel Multiplication on a Single DSP Slice for Embedded FPGAs |
title_sort | efficient method of parallel multiplication on a single dsp slice for embedded fpgas |
topic | DSP slice FPGAs multiplication performance optimization compute resource |
url | https://ieeexplore.ieee.org/document/8772177/ |
work_keys_str_mv | AT zhangqinhuang anefficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas AT shuozhang anefficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas AT weidongwang anefficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas AT zhangqinhuang efficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas AT shuozhang efficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas AT weidongwang efficientmethodofparallelmultiplicationonasingledspsliceforembeddedfpgas |