An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP

Matrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of...

Full description

Bibliographic Details
Main Authors:	Ilia Safonov, Anton Kornilov, Daria Makienko
Format:	Article
Language:	English
Published:	MDPI AG 2022-12-01
Series:	Electronics
Subjects:	GEMM SIMD instructions outer product fixed point DSP parallel processing
Online Access:	https://www.mdpi.com/2079-9292/12/1/78

_version_	1797625955185852416
author	Ilia Safonov Anton Kornilov Daria Makienko
author_facet	Ilia Safonov Anton Kornilov Daria Makienko
author_sort	Ilia Safonov
collection	DOAJ
description	Matrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of 32-bit signed integers and fixed-point numbers on DSP having SIMD instructions for 16-bit integers only. For examined tasks, matrix size may vary from several tens to two hundred. The proposed mathematical approach for dense rectangular matrix multiplication of 32-bit numbers comprises decomposition of 32-bit matrices to matrices of 16-bit numbers, four matrix multiplications of 16-bit unsigned integers via outer product, and correction of outcome for signed integers and fixed point numbers. Several tricks for performance optimization are analyzed. In addition, ways for block-wise and parallel implementations are described. An implementation of the proposed method by means of 16-bit vector instructions is faster than matrix multiplication using 32-bit scalar instructions and demonstrates performance close to a theoretically achievable limit. The described technique can be generalized for matrix multiplication of n-bit integers and fixed point numbers via handling with matrices of n/2-bit integers. In conclusion, recommendations for practitioners who work on implementation of matrix multiplication for various DSP are presented.
first_indexed	2024-03-11T10:03:48Z
format	Article
id	doaj.art-4de75149960b44c59eaaac8a49e0b5f8
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-11T10:03:48Z
publishDate	2022-12-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-4de75149960b44c59eaaac8a49e0b5f82023-11-16T15:10:51ZengMDPI AGElectronics2079-92922022-12-011217810.3390/electronics12010078An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSPIlia Safonov0Anton Kornilov1Daria Makienko2Computer Science and Control Systems Department, National Research Nuclear University MEPhI, Kashirskoye Highway, 31, 115409 Moscow, RussiaComputer Science and Control Systems Department, National Research Nuclear University MEPhI, Kashirskoye Highway, 31, 115409 Moscow, RussiaComputer Science and Control Systems Department, National Research Nuclear University MEPhI, Kashirskoye Highway, 31, 115409 Moscow, RussiaMatrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of 32-bit signed integers and fixed-point numbers on DSP having SIMD instructions for 16-bit integers only. For examined tasks, matrix size may vary from several tens to two hundred. The proposed mathematical approach for dense rectangular matrix multiplication of 32-bit numbers comprises decomposition of 32-bit matrices to matrices of 16-bit numbers, four matrix multiplications of 16-bit unsigned integers via outer product, and correction of outcome for signed integers and fixed point numbers. Several tricks for performance optimization are analyzed. In addition, ways for block-wise and parallel implementations are described. An implementation of the proposed method by means of 16-bit vector instructions is faster than matrix multiplication using 32-bit scalar instructions and demonstrates performance close to a theoretically achievable limit. The described technique can be generalized for matrix multiplication of n-bit integers and fixed point numbers via handling with matrices of n/2-bit integers. In conclusion, recommendations for practitioners who work on implementation of matrix multiplication for various DSP are presented.https://www.mdpi.com/2079-9292/12/1/78GEMMSIMD instructionsouter productfixed pointDSPparallel processing
spellingShingle	Ilia Safonov Anton Kornilov Daria Makienko An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP Electronics GEMM SIMD instructions outer product fixed point DSP parallel processing
title	An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
title_full	An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
title_fullStr	An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
title_full_unstemmed	An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
title_short	An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
title_sort	approach for matrix multiplication of 32 bit fixed point numbers by means of 16 bit simd instructions on dsp
topic	GEMM SIMD instructions outer product fixed point DSP parallel processing
url	https://www.mdpi.com/2079-9292/12/1/78
work_keys_str_mv	AT iliasafonov anapproachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp AT antonkornilov anapproachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp AT dariamakienko anapproachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp AT iliasafonov approachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp AT antonkornilov approachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp AT dariamakienko approachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp

An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP

Similar Items