Design and Implementation of Multithreaded Reproducible DGEMV for Phytium Processor
In high-performance computing,the accumulation of rounding error in the process of solving the large-scale,long time and ill-conditioned problem will lead to invalidated results.These results are useful for the developers to debug programs and check their correctness.Therefore,the reproducibility of...
Main Author: | |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial office of Computer Science
2022-10-01
|
Series: | Jisuanji kexue |
Subjects: | |
Online Access: | https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-10-27.pdf |
_version_ | 1797845110477553664 |
---|---|
author | CHEN Lei, TANG Tao, QI Hai-jun, JIANG Hao, HE Kang |
author_facet | CHEN Lei, TANG Tao, QI Hai-jun, JIANG Hao, HE Kang |
author_sort | CHEN Lei, TANG Tao, QI Hai-jun, JIANG Hao, HE Kang |
collection | DOAJ |
description | In high-performance computing,the accumulation of rounding error in the process of solving the large-scale,long time and ill-conditioned problem will lead to invalidated results.These results are useful for the developers to debug programs and check their correctness.Therefore,the reproducibility of the numerical results of the algorithm becomes very important.Based on the OpenBLAS’s framework,combining with Demmel’s reproducible method in ReproBLAS and multilayer block technology proposed by Castaldo,this paper designs a reproducible algorithm of multithreaded DGEMV for Phytium processor with rounding error analysis and error free transformation.Numerical experiments show that the output of the algorithm is the same as that of the ReproBLAS,which verifies the reproducibility.Our algorithm is up to 2x faster than that in ReproBLAS.Compared with the DGEMV function of OzBLAS proposed by Mukunoki,our algorithm runs at least 20x faster than that in OzBLAS with single thread,and 9x faster than that in OzBLAS with multi-threads.Theoretical analysis and numerical experiments illustrate that improved algorithm is accurate,validated and efficiency. |
first_indexed | 2024-04-09T17:33:16Z |
format | Article |
id | doaj.art-1c7197b429894b7da7d1c60c33279d01 |
institution | Directory Open Access Journal |
issn | 1002-137X |
language | zho |
last_indexed | 2024-04-09T17:33:16Z |
publishDate | 2022-10-01 |
publisher | Editorial office of Computer Science |
record_format | Article |
series | Jisuanji kexue |
spelling | doaj.art-1c7197b429894b7da7d1c60c33279d012023-04-18T02:32:39ZzhoEditorial office of Computer ScienceJisuanji kexue1002-137X2022-10-014910273510.11896/jsjkx.220100125Design and Implementation of Multithreaded Reproducible DGEMV for Phytium ProcessorCHEN Lei, TANG Tao, QI Hai-jun, JIANG Hao, HE Kang0College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,ChinaIn high-performance computing,the accumulation of rounding error in the process of solving the large-scale,long time and ill-conditioned problem will lead to invalidated results.These results are useful for the developers to debug programs and check their correctness.Therefore,the reproducibility of the numerical results of the algorithm becomes very important.Based on the OpenBLAS’s framework,combining with Demmel’s reproducible method in ReproBLAS and multilayer block technology proposed by Castaldo,this paper designs a reproducible algorithm of multithreaded DGEMV for Phytium processor with rounding error analysis and error free transformation.Numerical experiments show that the output of the algorithm is the same as that of the ReproBLAS,which verifies the reproducibility.Our algorithm is up to 2x faster than that in ReproBLAS.Compared with the DGEMV function of OzBLAS proposed by Mukunoki,our algorithm runs at least 20x faster than that in OzBLAS with single thread,and 9x faster than that in OzBLAS with multi-threads.Theoretical analysis and numerical experiments illustrate that improved algorithm is accurate,validated and efficiency.https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-10-27.pdfreproducibility|round-off error|error-free transformation|dgemv |
spellingShingle | CHEN Lei, TANG Tao, QI Hai-jun, JIANG Hao, HE Kang Design and Implementation of Multithreaded Reproducible DGEMV for Phytium Processor Jisuanji kexue reproducibility|round-off error|error-free transformation|dgemv |
title | Design and Implementation of Multithreaded Reproducible DGEMV for Phytium Processor |
title_full | Design and Implementation of Multithreaded Reproducible DGEMV for Phytium Processor |
title_fullStr | Design and Implementation of Multithreaded Reproducible DGEMV for Phytium Processor |
title_full_unstemmed | Design and Implementation of Multithreaded Reproducible DGEMV for Phytium Processor |
title_short | Design and Implementation of Multithreaded Reproducible DGEMV for Phytium Processor |
title_sort | design and implementation of multithreaded reproducible dgemv for phytium processor |
topic | reproducibility|round-off error|error-free transformation|dgemv |
url | https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-10-27.pdf |
work_keys_str_mv | AT chenleitangtaoqihaijunjianghaohekang designandimplementationofmultithreadedreproducibledgemvforphytiumprocessor |