High-performance SVD partial spectrum computation
We introduce a new singular value decomposition (SVD) solver based on the QR-based Dynamically Weighted Halley (QDWH) algorithm for computing the partial spectrum SVD (QDWHpartial-SVD) problems. By optimizing the rational function underlying the algorithms in the desired part of the spectrum only, t...
Main Authors: | , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
Association for Computing Machinery
2023
|
_version_ | 1817931438617526272 |
---|---|
author | Keyes, D Ltaief, H Nakatsukasa, YN Sukkari, D |
author_facet | Keyes, D Ltaief, H Nakatsukasa, YN Sukkari, D |
author_sort | Keyes, D |
collection | OXFORD |
description | We introduce a new singular value decomposition (SVD) solver based on the QR-based Dynamically Weighted Halley (QDWH) algorithm for computing the partial spectrum SVD (QDWHpartial-SVD) problems. By optimizing the rational function underlying the algorithms in the desired part of the spectrum only, the QDWHpartial-SVD algorithm efficiently computes a fraction (say 1--20%) of the leading singular values/vectors. We develop a high-performance implementation of QDWHpartial-SVD 1 on distributed-memory manycore systems and demonstrate its numerical robustness. We perform a benchmarking campaign against counterparts from the state-of-the-art numerical libraries across various matrix sizes using up to 36K MPI processes. Experimental results show performance speedups for QDWHpartial-SVD up to 6X and 2X against vendor-optimized PDGESVD from ScaLAPACK and KSVD on a Cray XC40 system using 1152 nodes based on two-socket 16-core Intel Haswell CPU, respectively. We also port our QDWHpartial-SVD software library to a system composed of 256 nodes with two-socket 64-Core AMD EPYC Milan CPU and achieve performance speedup up to 4X compared to vendor-optimized PDGESVD from ScaLAPACK. We also compare energy consumption for the two algorithms and demonstrate how QDWHpartial-SVD can further outperform PDGESVD in that regard by performing fewer memory-bound operations. |
first_indexed | 2024-03-07T07:50:59Z |
format | Conference item |
id | oxford-uuid:473580ef-6b2d-4421-899d-5ace836f70ac |
institution | University of Oxford |
language | English |
last_indexed | 2024-12-09T03:22:01Z |
publishDate | 2023 |
publisher | Association for Computing Machinery |
record_format | dspace |
spelling | oxford-uuid:473580ef-6b2d-4421-899d-5ace836f70ac2024-11-15T20:03:33ZHigh-performance SVD partial spectrum computationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:473580ef-6b2d-4421-899d-5ace836f70acEnglishSymplectic ElementsAssociation for Computing Machinery2023Keyes, DLtaief, HNakatsukasa, YNSukkari, DWe introduce a new singular value decomposition (SVD) solver based on the QR-based Dynamically Weighted Halley (QDWH) algorithm for computing the partial spectrum SVD (QDWHpartial-SVD) problems. By optimizing the rational function underlying the algorithms in the desired part of the spectrum only, the QDWHpartial-SVD algorithm efficiently computes a fraction (say 1--20%) of the leading singular values/vectors. We develop a high-performance implementation of QDWHpartial-SVD 1 on distributed-memory manycore systems and demonstrate its numerical robustness. We perform a benchmarking campaign against counterparts from the state-of-the-art numerical libraries across various matrix sizes using up to 36K MPI processes. Experimental results show performance speedups for QDWHpartial-SVD up to 6X and 2X against vendor-optimized PDGESVD from ScaLAPACK and KSVD on a Cray XC40 system using 1152 nodes based on two-socket 16-core Intel Haswell CPU, respectively. We also port our QDWHpartial-SVD software library to a system composed of 256 nodes with two-socket 64-Core AMD EPYC Milan CPU and achieve performance speedup up to 4X compared to vendor-optimized PDGESVD from ScaLAPACK. We also compare energy consumption for the two algorithms and demonstrate how QDWHpartial-SVD can further outperform PDGESVD in that regard by performing fewer memory-bound operations. |
spellingShingle | Keyes, D Ltaief, H Nakatsukasa, YN Sukkari, D High-performance SVD partial spectrum computation |
title | High-performance SVD partial spectrum computation |
title_full | High-performance SVD partial spectrum computation |
title_fullStr | High-performance SVD partial spectrum computation |
title_full_unstemmed | High-performance SVD partial spectrum computation |
title_short | High-performance SVD partial spectrum computation |
title_sort | high performance svd partial spectrum computation |
work_keys_str_mv | AT keyesd highperformancesvdpartialspectrumcomputation AT ltaiefh highperformancesvdpartialspectrumcomputation AT nakatsukasayn highperformancesvdpartialspectrumcomputation AT sukkarid highperformancesvdpartialspectrumcomputation |