DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems

The mapping of tasks to processor cores, called task mapping, is crucial to achieving scalable performance on multicore processors. On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because hea...

Full description

Bibliographic Details
Main Authors: Mulya Agung, Muhammad Alfian Amrizal, Ryusuke Egawa, Hiroyuki Takizawa
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8949493/
_version_ 1819170008101552128
author Mulya Agung
Muhammad Alfian Amrizal
Ryusuke Egawa
Hiroyuki Takizawa
author_facet Mulya Agung
Muhammad Alfian Amrizal
Ryusuke Egawa
Hiroyuki Takizawa
author_sort Mulya Agung
collection DOAJ
description The mapping of tasks to processor cores, called task mapping, is crucial to achieving scalable performance on multicore processors. On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because heavy congestion on shared caches and memory controllers could cause long latencies. Conventional work on task mapping mostly focuses on improving the locality of memory accesses. However, our previous work showed that on modern NUMA systems, maximizing the locality can degrade the performance due to memory congestion. In this work, we propose a task mapping method that addresses the locality and the memory congestion problems to improve the performance of parallel applications. In the proposed method, first, the spatial and temporal communication behaviors of the applications are analyzed from the time-series dataset of communications among the parallel tasks. Then, a data clustering technique is employed to detect groups of tasks that potentially cause the memory congestion. Finally, this information is used to compute the task mapping to improve the locality and reduce the memory congestion. We also provide a set of metrics to describe the communication behaviors and to evaluate if the target application can benefit from our method. The proposed method is evaluated with the NPB and PARSEC applications on a real NUMA system and a multicore simulator. A detailed analysis of the sources of performance gain is also provided. Experimental results show that our method can achieve up to a 61% performance improvement compared with the state-of-the-art locality-based method.
first_indexed 2024-12-22T19:28:33Z
format Article
id doaj.art-e2eb797b73d74a889c1900589bdc08c8
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T19:28:33Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-e2eb797b73d74a889c1900589bdc08c82022-12-21T18:15:10ZengIEEEIEEE Access2169-35362020-01-0186937695310.1109/ACCESS.2019.29637268949493DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA SystemsMulya Agung0https://orcid.org/0000-0001-9521-2177Muhammad Alfian Amrizal1https://orcid.org/0000-0003-1124-5137Ryusuke Egawa2https://orcid.org/0000-0001-8966-867XHiroyuki Takizawa3https://orcid.org/0000-0003-2858-3140Graduate School of Information Sciences, Tohoku University, Sendai, JapanResearch Institute of Electrical Communication, Tohoku University, Sendai, JapanCyberscience Center, Tohoku University, Sendai, JapanCyberscience Center, Tohoku University, Sendai, JapanThe mapping of tasks to processor cores, called task mapping, is crucial to achieving scalable performance on multicore processors. On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because heavy congestion on shared caches and memory controllers could cause long latencies. Conventional work on task mapping mostly focuses on improving the locality of memory accesses. However, our previous work showed that on modern NUMA systems, maximizing the locality can degrade the performance due to memory congestion. In this work, we propose a task mapping method that addresses the locality and the memory congestion problems to improve the performance of parallel applications. In the proposed method, first, the spatial and temporal communication behaviors of the applications are analyzed from the time-series dataset of communications among the parallel tasks. Then, a data clustering technique is employed to detect groups of tasks that potentially cause the memory congestion. Finally, this information is used to compute the task mapping to improve the locality and reduce the memory congestion. We also provide a set of metrics to describe the communication behaviors and to evaluate if the target application can benefit from our method. The proposed method is evaluated with the NPB and PARSEC applications on a real NUMA system and a multicore simulator. A detailed analysis of the sources of performance gain is also provided. Experimental results show that our method can achieve up to a 61% performance improvement compared with the state-of-the-art locality-based method.https://ieeexplore.ieee.org/document/8949493/High-performance computinglocalitymemory congestionNUMAprocess mappingtask mapping
spellingShingle Mulya Agung
Muhammad Alfian Amrizal
Ryusuke Egawa
Hiroyuki Takizawa
DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems
IEEE Access
High-performance computing
locality
memory congestion
NUMA
process mapping
task mapping
title DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems
title_full DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems
title_fullStr DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems
title_full_unstemmed DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems
title_short DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems
title_sort deloc a locality and memory congestion aware task mapping method for modern numa systems
topic High-performance computing
locality
memory congestion
NUMA
process mapping
task mapping
url https://ieeexplore.ieee.org/document/8949493/
work_keys_str_mv AT mulyaagung delocalocalityandmemorycongestionawaretaskmappingmethodformodernnumasystems
AT muhammadalfianamrizal delocalocalityandmemorycongestionawaretaskmappingmethodformodernnumasystems
AT ryusukeegawa delocalocalityandmemorycongestionawaretaskmappingmethodformodernnumasystems
AT hiroyukitakizawa delocalocalityandmemorycongestionawaretaskmappingmethodformodernnumasystems