Manycore algorithms for batch scalar and block tridiagonal solvers
Engineering, scientific, and financial applications often require the simultaneous solution of a large number of independent tridiagonal systems of equations with varying coefficients. Since the number of systems is large enough to offer considerable parallelism on manycore systems, the choice betwe...
Main Authors: | , , |
---|---|
Format: | Journal article |
Published: |
Association for Computing Machinery
2016
|
_version_ | 1797073515486117888 |
---|---|
author | László, E Giles, M Appleyard, J |
author_facet | László, E Giles, M Appleyard, J |
author_sort | László, E |
collection | OXFORD |
description | Engineering, scientific, and financial applications often require the simultaneous solution of a large number of independent tridiagonal systems of equations with varying coefficients. Since the number of systems is large enough to offer considerable parallelism on manycore systems, the choice between different tridiagonal solution algorithms, such as Thomas, Cyclic Reduction (CR) or Parallel Cyclic Reduction (PCR) needs to be reexamined. This work investigates the optimal choice of tridiagonal algorithm for CPU, Intel MIC, and NVIDIA GPU with a focus on minimizing the amount of data transfer to and from the main memory using novel algorithms and the register-blocking mechanism, and maximizing the achieved bandwidth. It also considers block tridiagonal solutions, which are sometimes required in Computational Fluid Dynamic (CFD) applications. A novel work-sharing and register blocking--based Thomas solver is also presented. |
first_indexed | 2024-03-06T23:23:18Z |
format | Journal article |
id | oxford-uuid:6985082c-8c58-4549-bfb8-6051b72b1fdc |
institution | University of Oxford |
last_indexed | 2024-03-06T23:23:18Z |
publishDate | 2016 |
publisher | Association for Computing Machinery |
record_format | dspace |
spelling | oxford-uuid:6985082c-8c58-4549-bfb8-6051b72b1fdc2022-03-26T18:51:29ZManycore algorithms for batch scalar and block tridiagonal solversJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:6985082c-8c58-4549-bfb8-6051b72b1fdcSymplectic Elements at OxfordAssociation for Computing Machinery2016László, EGiles, MAppleyard, JEngineering, scientific, and financial applications often require the simultaneous solution of a large number of independent tridiagonal systems of equations with varying coefficients. Since the number of systems is large enough to offer considerable parallelism on manycore systems, the choice between different tridiagonal solution algorithms, such as Thomas, Cyclic Reduction (CR) or Parallel Cyclic Reduction (PCR) needs to be reexamined. This work investigates the optimal choice of tridiagonal algorithm for CPU, Intel MIC, and NVIDIA GPU with a focus on minimizing the amount of data transfer to and from the main memory using novel algorithms and the register-blocking mechanism, and maximizing the achieved bandwidth. It also considers block tridiagonal solutions, which are sometimes required in Computational Fluid Dynamic (CFD) applications. A novel work-sharing and register blocking--based Thomas solver is also presented. |
spellingShingle | László, E Giles, M Appleyard, J Manycore algorithms for batch scalar and block tridiagonal solvers |
title | Manycore algorithms for batch scalar and block tridiagonal solvers |
title_full | Manycore algorithms for batch scalar and block tridiagonal solvers |
title_fullStr | Manycore algorithms for batch scalar and block tridiagonal solvers |
title_full_unstemmed | Manycore algorithms for batch scalar and block tridiagonal solvers |
title_short | Manycore algorithms for batch scalar and block tridiagonal solvers |
title_sort | manycore algorithms for batch scalar and block tridiagonal solvers |
work_keys_str_mv | AT laszloe manycorealgorithmsforbatchscalarandblocktridiagonalsolvers AT gilesm manycorealgorithmsforbatchscalarandblocktridiagonalsolvers AT appleyardj manycorealgorithmsforbatchscalarandblocktridiagonalsolvers |