TEB: Efficient SpMV Storage Format for Matrix Decomposition and Reconstruction on GPU
Sparse matrix-vector multiplication (SpMV) is a crucial computing process in the field of science and engineering. CSR (compressed sparse row) format is one of the most commonly used storage formats for sparse matrix. In the process of implementing parallel SpMV on the graphics processing unit (GPU)...
Main Author: | |
---|---|
Format: | Article |
Language: | zho |
Published: |
Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
2024-04-01
|
Series: | Jisuanji kexue yu tansuo |
Subjects: | |
Online Access: | http://fcst.ceaj.org/fileup/1673-9418/PDF/2304039.pdf |
_version_ | 1797231006553473024 |
---|---|
author | WANG Yuhua, ZHANG Yuqi, HE Junfei, XU Yuezhu, CUI Huanyu |
author_facet | WANG Yuhua, ZHANG Yuqi, HE Junfei, XU Yuezhu, CUI Huanyu |
author_sort | WANG Yuhua, ZHANG Yuqi, HE Junfei, XU Yuezhu, CUI Huanyu |
collection | DOAJ |
description | Sparse matrix-vector multiplication (SpMV) is a crucial computing process in the field of science and engineering. CSR (compressed sparse row) format is one of the most commonly used storage formats for sparse matrix. In the process of implementing parallel SpMV on the graphics processing unit (GPU), it only stores non-zero elements of sparse matrix, avoiding computational redundancy caused by zero element filling, and saving storage space. But there is a problem of load imbalance, which wastes computing resources. To address the aforementioned issues, storage formats with good performance in recent years have been studied, and a row by row decomposition and reorganization storage format—TEB (threshold-exchangeorder block) format has been proposed. The format first uses a heuristic threshold selection algorithm to determine the appropriate segmentation threshold, and combines the row merging algorithm based on reordering to reconstruct and decompose the sparse matrix, so that the number of non-zero elements between blocks is as close as possible. Furthermore, combined with CUDA (computer unified device architecture) thread technology, a parallel SpMV algorithm between sub blocks based on TEB storage format is proposed, which can reasonably allocate computing resources and solve the problem of load imbalance, thus improving the parallel computing efficiency of SpMV. In order to verify the effectiveness of the TEB storage format, experiments are conducted on the NVIDIA Tesla V100 platform. The results show that compared to PBC (partition-block-CSR), AMF-CSR (adaptive multi-row folding of CSR), CSR-Scalar (compressed sparse row-scalar), and CSR5 (compressed sparse row 5) storage formats, TEB can improve SpMV time performance by an average of 3.23×, 5.83×, 2.33×, and 2.21×. In terms of floating-point computing performance, the average improvement can be 3.36×, 5.95×, 2.29×, and 2.13× |
first_indexed | 2024-04-24T15:37:31Z |
format | Article |
id | doaj.art-9297e580bf2045719bc7b8950dbba89d |
institution | Directory Open Access Journal |
issn | 1673-9418 |
language | zho |
last_indexed | 2024-04-24T15:37:31Z |
publishDate | 2024-04-01 |
publisher | Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press |
record_format | Article |
series | Jisuanji kexue yu tansuo |
spelling | doaj.art-9297e580bf2045719bc7b8950dbba89d2024-04-02T01:27:23ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182024-04-011841094110810.3778/j.issn.1673-9418.2304039TEB: Efficient SpMV Storage Format for Matrix Decomposition and Reconstruction on GPUWANG Yuhua, ZHANG Yuqi, HE Junfei, XU Yuezhu, CUI Huanyu0School of Computer Science and Technology, Harbin Engineering University, Harbin 150000, ChinaSparse matrix-vector multiplication (SpMV) is a crucial computing process in the field of science and engineering. CSR (compressed sparse row) format is one of the most commonly used storage formats for sparse matrix. In the process of implementing parallel SpMV on the graphics processing unit (GPU), it only stores non-zero elements of sparse matrix, avoiding computational redundancy caused by zero element filling, and saving storage space. But there is a problem of load imbalance, which wastes computing resources. To address the aforementioned issues, storage formats with good performance in recent years have been studied, and a row by row decomposition and reorganization storage format—TEB (threshold-exchangeorder block) format has been proposed. The format first uses a heuristic threshold selection algorithm to determine the appropriate segmentation threshold, and combines the row merging algorithm based on reordering to reconstruct and decompose the sparse matrix, so that the number of non-zero elements between blocks is as close as possible. Furthermore, combined with CUDA (computer unified device architecture) thread technology, a parallel SpMV algorithm between sub blocks based on TEB storage format is proposed, which can reasonably allocate computing resources and solve the problem of load imbalance, thus improving the parallel computing efficiency of SpMV. In order to verify the effectiveness of the TEB storage format, experiments are conducted on the NVIDIA Tesla V100 platform. The results show that compared to PBC (partition-block-CSR), AMF-CSR (adaptive multi-row folding of CSR), CSR-Scalar (compressed sparse row-scalar), and CSR5 (compressed sparse row 5) storage formats, TEB can improve SpMV time performance by an average of 3.23×, 5.83×, 2.33×, and 2.21×. In terms of floating-point computing performance, the average improvement can be 3.36×, 5.95×, 2.29×, and 2.13×http://fcst.ceaj.org/fileup/1673-9418/PDF/2304039.pdfsparse matrix-vector multiplication (spmv); reorder; compressed sparse row (csr) format; load balancing; storage format; graphics processing unit (gpu) |
spellingShingle | WANG Yuhua, ZHANG Yuqi, HE Junfei, XU Yuezhu, CUI Huanyu TEB: Efficient SpMV Storage Format for Matrix Decomposition and Reconstruction on GPU Jisuanji kexue yu tansuo sparse matrix-vector multiplication (spmv); reorder; compressed sparse row (csr) format; load balancing; storage format; graphics processing unit (gpu) |
title | TEB: Efficient SpMV Storage Format for Matrix Decomposition and Reconstruction on GPU |
title_full | TEB: Efficient SpMV Storage Format for Matrix Decomposition and Reconstruction on GPU |
title_fullStr | TEB: Efficient SpMV Storage Format for Matrix Decomposition and Reconstruction on GPU |
title_full_unstemmed | TEB: Efficient SpMV Storage Format for Matrix Decomposition and Reconstruction on GPU |
title_short | TEB: Efficient SpMV Storage Format for Matrix Decomposition and Reconstruction on GPU |
title_sort | teb efficient spmv storage format for matrix decomposition and reconstruction on gpu |
topic | sparse matrix-vector multiplication (spmv); reorder; compressed sparse row (csr) format; load balancing; storage format; graphics processing unit (gpu) |
url | http://fcst.ceaj.org/fileup/1673-9418/PDF/2304039.pdf |
work_keys_str_mv | AT wangyuhuazhangyuqihejunfeixuyuezhucuihuanyu tebefficientspmvstorageformatformatrixdecompositionandreconstructionongpu |