Manycore algorithms for batch scalar and block tridiagonal solvers

Engineering, scientific, and financial applications often require the simultaneous solution of a large number of independent tridiagonal systems of equations with varying coefficients. Since the number of systems is large enough to offer considerable parallelism on manycore systems, the choice betwe...

Full description

Bibliographic Details
Main Authors: László, E, Giles, M, Appleyard, J
Format: Journal article
Published: Association for Computing Machinery 2016
_version_ 1797073515486117888
author László, E
Giles, M
Appleyard, J
author_facet László, E
Giles, M
Appleyard, J
author_sort László, E
collection OXFORD
description Engineering, scientific, and financial applications often require the simultaneous solution of a large number of independent tridiagonal systems of equations with varying coefficients. Since the number of systems is large enough to offer considerable parallelism on manycore systems, the choice between different tridiagonal solution algorithms, such as Thomas, Cyclic Reduction (CR) or Parallel Cyclic Reduction (PCR) needs to be reexamined. This work investigates the optimal choice of tridiagonal algorithm for CPU, Intel MIC, and NVIDIA GPU with a focus on minimizing the amount of data transfer to and from the main memory using novel algorithms and the register-blocking mechanism, and maximizing the achieved bandwidth. It also considers block tridiagonal solutions, which are sometimes required in Computational Fluid Dynamic (CFD) applications. A novel work-sharing and register blocking--based Thomas solver is also presented.
first_indexed 2024-03-06T23:23:18Z
format Journal article
id oxford-uuid:6985082c-8c58-4549-bfb8-6051b72b1fdc
institution University of Oxford
last_indexed 2024-03-06T23:23:18Z
publishDate 2016
publisher Association for Computing Machinery
record_format dspace
spelling oxford-uuid:6985082c-8c58-4549-bfb8-6051b72b1fdc2022-03-26T18:51:29ZManycore algorithms for batch scalar and block tridiagonal solversJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:6985082c-8c58-4549-bfb8-6051b72b1fdcSymplectic Elements at OxfordAssociation for Computing Machinery2016László, EGiles, MAppleyard, JEngineering, scientific, and financial applications often require the simultaneous solution of a large number of independent tridiagonal systems of equations with varying coefficients. Since the number of systems is large enough to offer considerable parallelism on manycore systems, the choice between different tridiagonal solution algorithms, such as Thomas, Cyclic Reduction (CR) or Parallel Cyclic Reduction (PCR) needs to be reexamined. This work investigates the optimal choice of tridiagonal algorithm for CPU, Intel MIC, and NVIDIA GPU with a focus on minimizing the amount of data transfer to and from the main memory using novel algorithms and the register-blocking mechanism, and maximizing the achieved bandwidth. It also considers block tridiagonal solutions, which are sometimes required in Computational Fluid Dynamic (CFD) applications. A novel work-sharing and register blocking--based Thomas solver is also presented.
spellingShingle László, E
Giles, M
Appleyard, J
Manycore algorithms for batch scalar and block tridiagonal solvers
title Manycore algorithms for batch scalar and block tridiagonal solvers
title_full Manycore algorithms for batch scalar and block tridiagonal solvers
title_fullStr Manycore algorithms for batch scalar and block tridiagonal solvers
title_full_unstemmed Manycore algorithms for batch scalar and block tridiagonal solvers
title_short Manycore algorithms for batch scalar and block tridiagonal solvers
title_sort manycore algorithms for batch scalar and block tridiagonal solvers
work_keys_str_mv AT laszloe manycorealgorithmsforbatchscalarandblocktridiagonalsolvers
AT gilesm manycorealgorithmsforbatchscalarandblocktridiagonalsolvers
AT appleyardj manycorealgorithmsforbatchscalarandblocktridiagonalsolvers