Tardis 2.0

Cache coherence scalability is a big challenge in shared memory systems. Traditional protocols do not scale due to the storage and traffic overhead of cache invalidation. Tardis, a recently proposed coherence protocol, removes cache invalidation using logical timestamps and achieves excellent scalab...

Full description

Bibliographic Details
Main Authors: Yu, Xiangyao, Liu, Hongzhe, Zou, Ethan, Devadas, Srinivas
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Association for Computing Machinery (ACM) 2018
Online Access:http://hdl.handle.net/1721.1/115327
https://orcid.org/0000-0003-4317-3457
https://orcid.org/0000-0001-8253-7714
_version_ 1826211115385749504
author Yu, Xiangyao
Liu, Hongzhe
Zou, Ethan
Devadas, Srinivas
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Yu, Xiangyao
Liu, Hongzhe
Zou, Ethan
Devadas, Srinivas
author_sort Yu, Xiangyao
collection MIT
description Cache coherence scalability is a big challenge in shared memory systems. Traditional protocols do not scale due to the storage and traffic overhead of cache invalidation. Tardis, a recently proposed coherence protocol, removes cache invalidation using logical timestamps and achieves excellent scalability. The original Tardis protocol, however, only supports the Sequential Consistency (SC) memory model, limiting its applicability. Tardis also incurs extra network traffic on some benchmarks due to renew messages, and has suboptimal performance when the program uses spinning to communicate between threads. In this paper, we address these downsides of Tardis protocol and make it significantly more practical. Specifically, we discuss the architectural, memory system and protocol changes required in order to implement the TSO consistency model on Tardis, and prove that the modified protocol satisfies TSO. We also describe modifications for Partial Store Order (PSO) and Release Consistency (RC). Finally, we propose optimizations for better leasing policies and to handle program spinning. On a set of benchmarks, optimized Tardis improves on a full-map directory protocol in the metrics of performance, storage and network traffic, while being simpler to implement.
first_indexed 2024-09-23T15:00:49Z
format Article
id mit-1721.1/115327
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T15:00:49Z
publishDate 2018
publisher Association for Computing Machinery (ACM)
record_format dspace
spelling mit-1721.1/1153272022-10-01T23:59:03Z Tardis 2.0 Yu, Xiangyao Liu, Hongzhe Zou, Ethan Devadas, Srinivas Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Yu, Xiangyao Devadas, Srinivas Cache coherence scalability is a big challenge in shared memory systems. Traditional protocols do not scale due to the storage and traffic overhead of cache invalidation. Tardis, a recently proposed coherence protocol, removes cache invalidation using logical timestamps and achieves excellent scalability. The original Tardis protocol, however, only supports the Sequential Consistency (SC) memory model, limiting its applicability. Tardis also incurs extra network traffic on some benchmarks due to renew messages, and has suboptimal performance when the program uses spinning to communicate between threads. In this paper, we address these downsides of Tardis protocol and make it significantly more practical. Specifically, we discuss the architectural, memory system and protocol changes required in order to implement the TSO consistency model on Tardis, and prove that the modified protocol satisfies TSO. We also describe modifications for Partial Store Order (PSO) and Release Consistency (RC). Finally, we propose optimizations for better leasing policies and to handle program spinning. On a set of benchmarks, optimized Tardis improves on a full-map directory protocol in the metrics of performance, storage and network traffic, while being simpler to implement. 2018-05-11T17:12:19Z 2018-05-11T17:12:19Z 2016-09 Article http://purl.org/eprint/type/ConferencePaper 978-1-4503-4121-9 http://hdl.handle.net/1721.1/115327 Yu, Xiangyao, et al. Tardis 2.0: "Optimized Time Traveling Coherence for Relaxed Consistency Models." PACT '16 Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 11-15 September, 2016, Haifa, Israel, ACM Press, 2016, pp. 261–74. https://orcid.org/0000-0003-4317-3457 https://orcid.org/0000-0001-8253-7714 en_US http://dx.doi.org/10.1145/2967938.2967942 Proceedings of the 2016 International Conference on Parallel Architectures and Compilation - PACT '16 Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association for Computing Machinery (ACM) MIT Web Domain
spellingShingle Yu, Xiangyao
Liu, Hongzhe
Zou, Ethan
Devadas, Srinivas
Tardis 2.0
title Tardis 2.0
title_full Tardis 2.0
title_fullStr Tardis 2.0
title_full_unstemmed Tardis 2.0
title_short Tardis 2.0
title_sort tardis 2 0
url http://hdl.handle.net/1721.1/115327
https://orcid.org/0000-0003-4317-3457
https://orcid.org/0000-0001-8253-7714
work_keys_str_mv AT yuxiangyao tardis20
AT liuhongzhe tardis20
AT zouethan tardis20
AT devadassrinivas tardis20