An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
Matrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-12-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/1/78 |
_version_ | 1797625955185852416 |
---|---|
author | Ilia Safonov Anton Kornilov Daria Makienko |
author_facet | Ilia Safonov Anton Kornilov Daria Makienko |
author_sort | Ilia Safonov |
collection | DOAJ |
description | Matrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of 32-bit signed integers and fixed-point numbers on DSP having SIMD instructions for 16-bit integers only. For examined tasks, matrix size may vary from several tens to two hundred. The proposed mathematical approach for dense rectangular matrix multiplication of 32-bit numbers comprises decomposition of 32-bit matrices to matrices of 16-bit numbers, four matrix multiplications of 16-bit unsigned integers via outer product, and correction of outcome for signed integers and fixed point numbers. Several tricks for performance optimization are analyzed. In addition, ways for block-wise and parallel implementations are described. An implementation of the proposed method by means of 16-bit vector instructions is faster than matrix multiplication using 32-bit scalar instructions and demonstrates performance close to a theoretically achievable limit. The described technique can be generalized for matrix multiplication of n-bit integers and fixed point numbers via handling with matrices of n/2-bit integers. In conclusion, recommendations for practitioners who work on implementation of matrix multiplication for various DSP are presented. |
first_indexed | 2024-03-11T10:03:48Z |
format | Article |
id | doaj.art-4de75149960b44c59eaaac8a49e0b5f8 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-11T10:03:48Z |
publishDate | 2022-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-4de75149960b44c59eaaac8a49e0b5f82023-11-16T15:10:51ZengMDPI AGElectronics2079-92922022-12-011217810.3390/electronics12010078An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSPIlia Safonov0Anton Kornilov1Daria Makienko2Computer Science and Control Systems Department, National Research Nuclear University MEPhI, Kashirskoye Highway, 31, 115409 Moscow, RussiaComputer Science and Control Systems Department, National Research Nuclear University MEPhI, Kashirskoye Highway, 31, 115409 Moscow, RussiaComputer Science and Control Systems Department, National Research Nuclear University MEPhI, Kashirskoye Highway, 31, 115409 Moscow, RussiaMatrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of 32-bit signed integers and fixed-point numbers on DSP having SIMD instructions for 16-bit integers only. For examined tasks, matrix size may vary from several tens to two hundred. The proposed mathematical approach for dense rectangular matrix multiplication of 32-bit numbers comprises decomposition of 32-bit matrices to matrices of 16-bit numbers, four matrix multiplications of 16-bit unsigned integers via outer product, and correction of outcome for signed integers and fixed point numbers. Several tricks for performance optimization are analyzed. In addition, ways for block-wise and parallel implementations are described. An implementation of the proposed method by means of 16-bit vector instructions is faster than matrix multiplication using 32-bit scalar instructions and demonstrates performance close to a theoretically achievable limit. The described technique can be generalized for matrix multiplication of n-bit integers and fixed point numbers via handling with matrices of n/2-bit integers. In conclusion, recommendations for practitioners who work on implementation of matrix multiplication for various DSP are presented.https://www.mdpi.com/2079-9292/12/1/78GEMMSIMD instructionsouter productfixed pointDSPparallel processing |
spellingShingle | Ilia Safonov Anton Kornilov Daria Makienko An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP Electronics GEMM SIMD instructions outer product fixed point DSP parallel processing |
title | An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP |
title_full | An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP |
title_fullStr | An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP |
title_full_unstemmed | An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP |
title_short | An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP |
title_sort | approach for matrix multiplication of 32 bit fixed point numbers by means of 16 bit simd instructions on dsp |
topic | GEMM SIMD instructions outer product fixed point DSP parallel processing |
url | https://www.mdpi.com/2079-9292/12/1/78 |
work_keys_str_mv | AT iliasafonov anapproachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp AT antonkornilov anapproachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp AT dariamakienko anapproachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp AT iliasafonov approachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp AT antonkornilov approachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp AT dariamakienko approachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp |