An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP

Matrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of...

Full description

Bibliographic Details
Main Authors: Ilia Safonov, Anton Kornilov, Daria Makienko
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/1/78
_version_ 1797625955185852416
author Ilia Safonov
Anton Kornilov
Daria Makienko
author_facet Ilia Safonov
Anton Kornilov
Daria Makienko
author_sort Ilia Safonov
collection DOAJ
description Matrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of 32-bit signed integers and fixed-point numbers on DSP having SIMD instructions for 16-bit integers only. For examined tasks, matrix size may vary from several tens to two hundred. The proposed mathematical approach for dense rectangular matrix multiplication of 32-bit numbers comprises decomposition of 32-bit matrices to matrices of 16-bit numbers, four matrix multiplications of 16-bit unsigned integers via outer product, and correction of outcome for signed integers and fixed point numbers. Several tricks for performance optimization are analyzed. In addition, ways for block-wise and parallel implementations are described. An implementation of the proposed method by means of 16-bit vector instructions is faster than matrix multiplication using 32-bit scalar instructions and demonstrates performance close to a theoretically achievable limit. The described technique can be generalized for matrix multiplication of n-bit integers and fixed point numbers via handling with matrices of n/2-bit integers. In conclusion, recommendations for practitioners who work on implementation of matrix multiplication for various DSP are presented.
first_indexed 2024-03-11T10:03:48Z
format Article
id doaj.art-4de75149960b44c59eaaac8a49e0b5f8
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-11T10:03:48Z
publishDate 2022-12-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-4de75149960b44c59eaaac8a49e0b5f82023-11-16T15:10:51ZengMDPI AGElectronics2079-92922022-12-011217810.3390/electronics12010078An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSPIlia Safonov0Anton Kornilov1Daria Makienko2Computer Science and Control Systems Department, National Research Nuclear University MEPhI, Kashirskoye Highway, 31, 115409 Moscow, RussiaComputer Science and Control Systems Department, National Research Nuclear University MEPhI, Kashirskoye Highway, 31, 115409 Moscow, RussiaComputer Science and Control Systems Department, National Research Nuclear University MEPhI, Kashirskoye Highway, 31, 115409 Moscow, RussiaMatrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of 32-bit signed integers and fixed-point numbers on DSP having SIMD instructions for 16-bit integers only. For examined tasks, matrix size may vary from several tens to two hundred. The proposed mathematical approach for dense rectangular matrix multiplication of 32-bit numbers comprises decomposition of 32-bit matrices to matrices of 16-bit numbers, four matrix multiplications of 16-bit unsigned integers via outer product, and correction of outcome for signed integers and fixed point numbers. Several tricks for performance optimization are analyzed. In addition, ways for block-wise and parallel implementations are described. An implementation of the proposed method by means of 16-bit vector instructions is faster than matrix multiplication using 32-bit scalar instructions and demonstrates performance close to a theoretically achievable limit. The described technique can be generalized for matrix multiplication of n-bit integers and fixed point numbers via handling with matrices of n/2-bit integers. In conclusion, recommendations for practitioners who work on implementation of matrix multiplication for various DSP are presented.https://www.mdpi.com/2079-9292/12/1/78GEMMSIMD instructionsouter productfixed pointDSPparallel processing
spellingShingle Ilia Safonov
Anton Kornilov
Daria Makienko
An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
Electronics
GEMM
SIMD instructions
outer product
fixed point
DSP
parallel processing
title An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
title_full An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
title_fullStr An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
title_full_unstemmed An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
title_short An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP
title_sort approach for matrix multiplication of 32 bit fixed point numbers by means of 16 bit simd instructions on dsp
topic GEMM
SIMD instructions
outer product
fixed point
DSP
parallel processing
url https://www.mdpi.com/2079-9292/12/1/78
work_keys_str_mv AT iliasafonov anapproachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp
AT antonkornilov anapproachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp
AT dariamakienko anapproachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp
AT iliasafonov approachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp
AT antonkornilov approachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp
AT dariamakienko approachformatrixmultiplicationof32bitfixedpointnumbersbymeansof16bitsimdinstructionsondsp