Dynamic Cache Contention Detection in Multi-threaded Applications

In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionall...

Szczegółowa specyfikacja

Opis bibliograficzny
Główni autorzy:	Zhao, Qin, Koh, David F., Raza, Syed A., Amarasinghe, Saman P., Bruening, Derek, Wong, Weng-Fai
Kolejni autorzy:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Artykuł
Język:	en_US
Wydane:	Association for Computing Machinery / ACM Special Interest Group on Programming Languages./ ACM Special Interest Group in Operating Systems. 2011
Dostęp online:	http://hdl.handle.net/1721.1/62586 https://orcid.org/0000-0002-7231-7643

_version_	1826206816722223104
author	Zhao, Qin Koh, David F. Raza, Syed A. Amarasinghe, Saman P. Bruening, Derek Wong, Weng-Fai
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Zhao, Qin Koh, David F. Raza, Syed A. Amarasinghe, Saman P. Bruening, Derek Wong, Weng-Fai
author_sort	Zhao, Qin
collection	MIT
description	In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy. In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach --- a 5x slowdown on average relative to native execution --- is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications up to a factor of 12x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores.
first_indexed	2024-09-23T13:39:01Z
format	Article
id	mit-1721.1/62586
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T13:39:01Z
publishDate	2011
publisher	Association for Computing Machinery / ACM Special Interest Group on Programming Languages./ ACM Special Interest Group in Operating Systems.
record_format	dspace
spelling	mit-1721.1/625862022-09-28T15:16:51Z Dynamic Cache Contention Detection in Multi-threaded Applications Zhao, Qin Koh, David F. Raza, Syed A. Amarasinghe, Saman P. Bruening, Derek Wong, Weng-Fai Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Amarasinghe, Saman P. Zhao, Qin Koh, David F. Raza, Syed A. Amarasinghe, Saman P. In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy. In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach --- a 5x slowdown on average relative to native execution --- is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications up to a factor of 12x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores. 2011-05-04T19:18:08Z 2011-05-04T19:18:08Z 2011-03 Article http://purl.org/eprint/type/ConferencePaper 978-1-4503-0687-4 http://hdl.handle.net/1721.1/62586 Zhao, Qin et al. “Dynamic Cache Contention Detection in Multi-threaded Applications.” Proceedings of the 7th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments - VEE ’11. Newport Beach, California, USA, 2011. 27. Copyright c2011 ACM https://orcid.org/0000-0002-7231-7643 en_US http://dx.doi.org/10.1145/1952682.1952688 VEE Proceedings (ACM SIGPLAN SIGOPS International Conference on Virtual Execution Environments) Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Association for Computing Machinery / ACM Special Interest Group on Programming Languages./ ACM Special Interest Group in Operating Systems. MIT web domain
spellingShingle	Zhao, Qin Koh, David F. Raza, Syed A. Amarasinghe, Saman P. Bruening, Derek Wong, Weng-Fai Dynamic Cache Contention Detection in Multi-threaded Applications
title	Dynamic Cache Contention Detection in Multi-threaded Applications
title_full	Dynamic Cache Contention Detection in Multi-threaded Applications
title_fullStr	Dynamic Cache Contention Detection in Multi-threaded Applications
title_full_unstemmed	Dynamic Cache Contention Detection in Multi-threaded Applications
title_short	Dynamic Cache Contention Detection in Multi-threaded Applications
title_sort	dynamic cache contention detection in multi threaded applications
url	http://hdl.handle.net/1721.1/62586 https://orcid.org/0000-0002-7231-7643
work_keys_str_mv	AT zhaoqin dynamiccachecontentiondetectioninmultithreadedapplications AT kohdavidf dynamiccachecontentiondetectioninmultithreadedapplications AT razasyeda dynamiccachecontentiondetectioninmultithreadedapplications AT amarasinghesamanp dynamiccachecontentiondetectioninmultithreadedapplications AT brueningderek dynamiccachecontentiondetectioninmultithreadedapplications AT wongwengfai dynamiccachecontentiondetectioninmultithreadedapplications

Dynamic Cache Contention Detection in Multi-threaded Applications

Podobne zapisy