HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution

Heterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems ha...

Full description

Bibliographic Details
Main Authors: Lanjun Wan, Weihua Zheng, Xinpan Yuan
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9598836/
_version_ 1819038623913213952
author Lanjun Wan
Weihua Zheng
Xinpan Yuan
author_facet Lanjun Wan
Weihua Zheng
Xinpan Yuan
author_sort Lanjun Wan
collection DOAJ
description Heterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems have attracted increasing attention especially. To make full use of multiple CPUs and accelerators to execute data-parallel applications, programmers may need to manually map computation and data to all available compute devices, which is tedious, error-prone, and difficult. Especially for some data-parallel applications, the inter-device communication could easily become the performance bottleneck of multi-device co-execution. Therefore, firstly, a runtime system is designed for supporting heterogeneous cooperative execution (HCE) of data-parallel applications, which can help programmers to automatically and efficiently map computation and data to multiple compute devices. Secondly, an incremental data transfer method is designed to avoid redundant data transfers between devices, and a three-way overlapping communication optimization method based on software pipelining is designed to effectively hide the inter-device communication overhead. Based on our previously proposed feedback-based dynamic and elastic task scheduling (FDETS) scheme and asynchronous-based dynamic and elastic task scheduling (ADETS) scheme, the modified FDETS that supports incremental data transfer and the modified ADETS that supports three-way overlapping communication optimization are proposed, which not only can effectively partition and balance the workload among multiple compute devices but also can significantly reduce data transfer overhead between devices. Thirdly, a prototype of the proposed runtime system is implemented, which provides a set of runtime APIs for task scheduling, device management, memory management, and transfer optimization. Our experimental results show that the communication overhead between devices is greatly reduced using the proposed inter-device communication optimization methods and the multi-device co-execution significantly outperforms the best single-device execution.
first_indexed 2024-12-21T08:40:16Z
format Article
id doaj.art-f511f88b91524831aba56e62d3a74ea7
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-21T08:40:16Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-f511f88b91524831aba56e62d3a74ea72022-12-21T19:09:58ZengIEEEIEEE Access2169-35362021-01-01914726414727910.1109/ACCESS.2021.31248569598836HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative ExecutionLanjun Wan0https://orcid.org/0000-0001-7236-3589Weihua Zheng1Xinpan Yuan2School of Computer Science, Hunan University of Technology, Zhuzhou, ChinaCollege of Electrical and Information Engineering, Hunan University of Technology, Zhuzhou, ChinaSchool of Computer Science, Hunan University of Technology, Zhuzhou, ChinaHeterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems have attracted increasing attention especially. To make full use of multiple CPUs and accelerators to execute data-parallel applications, programmers may need to manually map computation and data to all available compute devices, which is tedious, error-prone, and difficult. Especially for some data-parallel applications, the inter-device communication could easily become the performance bottleneck of multi-device co-execution. Therefore, firstly, a runtime system is designed for supporting heterogeneous cooperative execution (HCE) of data-parallel applications, which can help programmers to automatically and efficiently map computation and data to multiple compute devices. Secondly, an incremental data transfer method is designed to avoid redundant data transfers between devices, and a three-way overlapping communication optimization method based on software pipelining is designed to effectively hide the inter-device communication overhead. Based on our previously proposed feedback-based dynamic and elastic task scheduling (FDETS) scheme and asynchronous-based dynamic and elastic task scheduling (ADETS) scheme, the modified FDETS that supports incremental data transfer and the modified ADETS that supports three-way overlapping communication optimization are proposed, which not only can effectively partition and balance the workload among multiple compute devices but also can significantly reduce data transfer overhead between devices. Thirdly, a prototype of the proposed runtime system is implemented, which provides a set of runtime APIs for task scheduling, device management, memory management, and transfer optimization. Our experimental results show that the communication overhead between devices is greatly reduced using the proposed inter-device communication optimization methods and the multi-device co-execution significantly outperforms the best single-device execution.https://ieeexplore.ieee.org/document/9598836/Communication optimizationcooperative executiondata-parallel applicationsdynamic schedulingheterogeneous systemsruntime system
spellingShingle Lanjun Wan
Weihua Zheng
Xinpan Yuan
HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
IEEE Access
Communication optimization
cooperative execution
data-parallel applications
dynamic scheduling
heterogeneous systems
runtime system
title HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
title_full HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
title_fullStr HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
title_full_unstemmed HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
title_short HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
title_sort hce a runtime system for efficiently supporting heterogeneous cooperative execution
topic Communication optimization
cooperative execution
data-parallel applications
dynamic scheduling
heterogeneous systems
runtime system
url https://ieeexplore.ieee.org/document/9598836/
work_keys_str_mv AT lanjunwan hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution
AT weihuazheng hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution
AT xinpanyuan hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution