Optimizing Iterative Data-Flow Scientific Applications Using Directed Cyclic Graphs

Data-flow programming models have become a popular choice for writing parallel applications as an alternative to traditional work-sharing parallelism. They are better suited to write applications with irregular parallelism that can present load imbalance. However, these programming models suffer fro...

Full description

Bibliographic Details
Main Authors: David Alvarez, Vicenc Beltran
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10107598/
_version_ 1797810620481929216
author David Alvarez
Vicenc Beltran
author_facet David Alvarez
Vicenc Beltran
author_sort David Alvarez
collection DOAJ
description Data-flow programming models have become a popular choice for writing parallel applications as an alternative to traditional work-sharing parallelism. They are better suited to write applications with irregular parallelism that can present load imbalance. However, these programming models suffer from overheads related to task creation, scheduling and dependency management, limiting performance and scalability when tasks become too small. At the same time, many HPC applications implement iterative methods or multi-step simulations that create the same directed acyclic graphs of tasks on each iteration. By giving application programmers a way to express that a specific loop is creating the same task pattern on each iteration, we can create a single task directed acyclic graph (DAG) once and transform it into a cyclic graph. This cyclic graph is then reused for successive iterations, minimizing task creation and dependency management overhead. This paper presents the taskiter, a new construct we propose for the OmpSs-2 and OpenMP programming models, allowing the use of directed cyclic task graphs (DCTG) to minimize runtime overheads. Moreover, we present a simple immediate successor locality-aware heuristic that minimizes task scheduling overhead by bypassing the runtime task scheduler. We evaluate the implementation of the taskiter and the immediate successor heuristic in 8 iterative benchmarks. Using small task granularities, we obtain a geometric mean speedup of 2.56x over the reference OmpSs-2 implementation, and a 3.77x and 5.2x speedup over the LLVM and GCC OpenMP runtimes, respectively.
first_indexed 2024-03-13T07:11:33Z
format Article
id doaj.art-2733358dd8db41d1ad33688700f61b55
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-13T07:11:33Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-2733358dd8db41d1ad33688700f61b552023-06-05T23:00:16ZengIEEEIEEE Access2169-35362023-01-0111519715198410.1109/ACCESS.2023.326990210107598Optimizing Iterative Data-Flow Scientific Applications Using Directed Cyclic GraphsDavid Alvarez0https://orcid.org/0000-0002-4607-1627Vicenc Beltran1https://orcid.org/0000-0002-3580-9630Barcelona Supercomputing Center, Barcelona, SpainBarcelona Supercomputing Center, Barcelona, SpainData-flow programming models have become a popular choice for writing parallel applications as an alternative to traditional work-sharing parallelism. They are better suited to write applications with irregular parallelism that can present load imbalance. However, these programming models suffer from overheads related to task creation, scheduling and dependency management, limiting performance and scalability when tasks become too small. At the same time, many HPC applications implement iterative methods or multi-step simulations that create the same directed acyclic graphs of tasks on each iteration. By giving application programmers a way to express that a specific loop is creating the same task pattern on each iteration, we can create a single task directed acyclic graph (DAG) once and transform it into a cyclic graph. This cyclic graph is then reused for successive iterations, minimizing task creation and dependency management overhead. This paper presents the taskiter, a new construct we propose for the OmpSs-2 and OpenMP programming models, allowing the use of directed cyclic task graphs (DCTG) to minimize runtime overheads. Moreover, we present a simple immediate successor locality-aware heuristic that minimizes task scheduling overhead by bypassing the runtime task scheduler. We evaluate the implementation of the taskiter and the immediate successor heuristic in 8 iterative benchmarks. Using small task granularities, we obtain a geometric mean speedup of 2.56x over the reference OmpSs-2 implementation, and a 3.77x and 5.2x speedup over the LLVM and GCC OpenMP runtimes, respectively.https://ieeexplore.ieee.org/document/10107598/Taskiterdata-flow programmingompss-2openmpiterative applications
spellingShingle David Alvarez
Vicenc Beltran
Optimizing Iterative Data-Flow Scientific Applications Using Directed Cyclic Graphs
IEEE Access
Taskiter
data-flow programming
ompss-2
openmp
iterative applications
title Optimizing Iterative Data-Flow Scientific Applications Using Directed Cyclic Graphs
title_full Optimizing Iterative Data-Flow Scientific Applications Using Directed Cyclic Graphs
title_fullStr Optimizing Iterative Data-Flow Scientific Applications Using Directed Cyclic Graphs
title_full_unstemmed Optimizing Iterative Data-Flow Scientific Applications Using Directed Cyclic Graphs
title_short Optimizing Iterative Data-Flow Scientific Applications Using Directed Cyclic Graphs
title_sort optimizing iterative data flow scientific applications using directed cyclic graphs
topic Taskiter
data-flow programming
ompss-2
openmp
iterative applications
url https://ieeexplore.ieee.org/document/10107598/
work_keys_str_mv AT davidalvarez optimizingiterativedataflowscientificapplicationsusingdirectedcyclicgraphs
AT vicencbeltran optimizingiterativedataflowscientificapplicationsusingdirectedcyclicgraphs