HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution
Heterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems ha...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9598836/ |
_version_ | 1819038623913213952 |
---|---|
author | Lanjun Wan Weihua Zheng Xinpan Yuan |
author_facet | Lanjun Wan Weihua Zheng Xinpan Yuan |
author_sort | Lanjun Wan |
collection | DOAJ |
description | Heterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems have attracted increasing attention especially. To make full use of multiple CPUs and accelerators to execute data-parallel applications, programmers may need to manually map computation and data to all available compute devices, which is tedious, error-prone, and difficult. Especially for some data-parallel applications, the inter-device communication could easily become the performance bottleneck of multi-device co-execution. Therefore, firstly, a runtime system is designed for supporting heterogeneous cooperative execution (HCE) of data-parallel applications, which can help programmers to automatically and efficiently map computation and data to multiple compute devices. Secondly, an incremental data transfer method is designed to avoid redundant data transfers between devices, and a three-way overlapping communication optimization method based on software pipelining is designed to effectively hide the inter-device communication overhead. Based on our previously proposed feedback-based dynamic and elastic task scheduling (FDETS) scheme and asynchronous-based dynamic and elastic task scheduling (ADETS) scheme, the modified FDETS that supports incremental data transfer and the modified ADETS that supports three-way overlapping communication optimization are proposed, which not only can effectively partition and balance the workload among multiple compute devices but also can significantly reduce data transfer overhead between devices. Thirdly, a prototype of the proposed runtime system is implemented, which provides a set of runtime APIs for task scheduling, device management, memory management, and transfer optimization. Our experimental results show that the communication overhead between devices is greatly reduced using the proposed inter-device communication optimization methods and the multi-device co-execution significantly outperforms the best single-device execution. |
first_indexed | 2024-12-21T08:40:16Z |
format | Article |
id | doaj.art-f511f88b91524831aba56e62d3a74ea7 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-21T08:40:16Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-f511f88b91524831aba56e62d3a74ea72022-12-21T19:09:58ZengIEEEIEEE Access2169-35362021-01-01914726414727910.1109/ACCESS.2021.31248569598836HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative ExecutionLanjun Wan0https://orcid.org/0000-0001-7236-3589Weihua Zheng1Xinpan Yuan2School of Computer Science, Hunan University of Technology, Zhuzhou, ChinaCollege of Electrical and Information Engineering, Hunan University of Technology, Zhuzhou, ChinaSchool of Computer Science, Hunan University of Technology, Zhuzhou, ChinaHeterogeneous systems with multiple different compute devices have come into common use recently, and the heterogeneity of the compute device is mainly reflected in three aspects: hardware architecture, instruction set architecture, and processing capability. Heterogeneous CPU-accelerator systems have attracted increasing attention especially. To make full use of multiple CPUs and accelerators to execute data-parallel applications, programmers may need to manually map computation and data to all available compute devices, which is tedious, error-prone, and difficult. Especially for some data-parallel applications, the inter-device communication could easily become the performance bottleneck of multi-device co-execution. Therefore, firstly, a runtime system is designed for supporting heterogeneous cooperative execution (HCE) of data-parallel applications, which can help programmers to automatically and efficiently map computation and data to multiple compute devices. Secondly, an incremental data transfer method is designed to avoid redundant data transfers between devices, and a three-way overlapping communication optimization method based on software pipelining is designed to effectively hide the inter-device communication overhead. Based on our previously proposed feedback-based dynamic and elastic task scheduling (FDETS) scheme and asynchronous-based dynamic and elastic task scheduling (ADETS) scheme, the modified FDETS that supports incremental data transfer and the modified ADETS that supports three-way overlapping communication optimization are proposed, which not only can effectively partition and balance the workload among multiple compute devices but also can significantly reduce data transfer overhead between devices. Thirdly, a prototype of the proposed runtime system is implemented, which provides a set of runtime APIs for task scheduling, device management, memory management, and transfer optimization. Our experimental results show that the communication overhead between devices is greatly reduced using the proposed inter-device communication optimization methods and the multi-device co-execution significantly outperforms the best single-device execution.https://ieeexplore.ieee.org/document/9598836/Communication optimizationcooperative executiondata-parallel applicationsdynamic schedulingheterogeneous systemsruntime system |
spellingShingle | Lanjun Wan Weihua Zheng Xinpan Yuan HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution IEEE Access Communication optimization cooperative execution data-parallel applications dynamic scheduling heterogeneous systems runtime system |
title | HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution |
title_full | HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution |
title_fullStr | HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution |
title_full_unstemmed | HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution |
title_short | HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative Execution |
title_sort | hce a runtime system for efficiently supporting heterogeneous cooperative execution |
topic | Communication optimization cooperative execution data-parallel applications dynamic scheduling heterogeneous systems runtime system |
url | https://ieeexplore.ieee.org/document/9598836/ |
work_keys_str_mv | AT lanjunwan hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution AT weihuazheng hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution AT xinpanyuan hcearuntimesystemforefficientlysupportingheterogeneouscooperativeexecution |