Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP

Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for inflight memory requests. These resources, however, often exhibi...

Full description

Bibliographic Details
Main Authors:	Kiriansky, Vladimir, Xu, Haoran, Rinard, Martin, Amarasinghe, Saman
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	English
Published:	Association of Computing Machinery 2020
Online Access:	https://hdl.handle.net/1721.1/125080

_version_	1826194001859969024
author	Kiriansky, Vladimir Xu, Haoran Rinard, Martin Amarasinghe, Saman
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Kiriansky, Vladimir Xu, Haoran Rinard, Martin Amarasinghe, Saman
author_sort	Kiriansky, Vladimir
collection	MIT
description	Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for inflight memory requests. These resources, however, often exhibit poor utilization rates on workloads with large working sets, e.g., in-memory databases, key-value stores, and graph analytics, as compilers and hardware struggle to expose ILP and MLP from the instruction stream automatically. In this paper, we introduce the IMLP (Instruction and Memory Level Parallelism) task programming model. IMLP tasks execute as coroutines that yield execution at annotated long-latency operations, e.g., memory accesses, divisions, or unpredictable branches. IMLP tasks are interleaved on a single thread, and integrate well with thread parallelism and vectorization. Our DSL embedded in C++, Cimple, allows exploration of task scheduling and transformations, such as buffering, vectorization, pipelining, and prefetching. We demonstrate state-of-the-art performance on core algorithms used in in-memory databases that operate on arrays, hash tables, trees, and skip lists. Cimple applications reach 2.5× throughput gains over hardware multithreading on a multi-core, and 6.4× single thread speedup.
first_indexed	2024-09-23T09:48:44Z
format	Article
id	mit-1721.1/125080
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T09:48:44Z
publishDate	2020
publisher	Association of Computing Machinery
record_format	dspace
spelling	mit-1721.1/1250802022-09-26T13:54:22Z Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP Kiriansky, Vladimir Xu, Haoran Rinard, Martin Amarasinghe, Saman Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for inflight memory requests. These resources, however, often exhibit poor utilization rates on workloads with large working sets, e.g., in-memory databases, key-value stores, and graph analytics, as compilers and hardware struggle to expose ILP and MLP from the instruction stream automatically. In this paper, we introduce the IMLP (Instruction and Memory Level Parallelism) task programming model. IMLP tasks execute as coroutines that yield execution at annotated long-latency operations, e.g., memory accesses, divisions, or unpredictable branches. IMLP tasks are interleaved on a single thread, and integrate well with thread parallelism and vectorization. Our DSL embedded in C++, Cimple, allows exploration of task scheduling and transformations, such as buffering, vectorization, pipelining, and prefetching. We demonstrate state-of-the-art performance on core algorithms used in in-memory databases that operate on arrays, hash tables, trees, and skip lists. Cimple applications reach 2.5× throughput gains over hardware multithreading on a multi-core, and 6.4× single thread speedup. DOE (Grant DE-SC0014204) Toyota Research Institute (Grant LP-C000765-SR) 2020-05-06T20:05:53Z 2020-05-06T20:05:53Z 2018-11 2019-07-02T16:34:47Z Article http://purl.org/eprint/type/ConferencePaper 9781450359863 https://hdl.handle.net/1721.1/125080 Kiriansky, Vladimir et al. "Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP." Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (November 2018): 30 © 2018 Association for Computing Machinery en http://dx.doi.org/10.1145/3243176.3243185 Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association of Computing Machinery arXiv
spellingShingle	Kiriansky, Vladimir Xu, Haoran Rinard, Martin Amarasinghe, Saman Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP
title	Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP
title_full	Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP
title_fullStr	Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP
title_full_unstemmed	Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP
title_short	Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP
title_sort	cimple instruction and memory level parallelism a dsl for uncovering ilp and mlp
url	https://hdl.handle.net/1721.1/125080
work_keys_str_mv	AT kirianskyvladimir cimpleinstructionandmemorylevelparallelismadslforuncoveringilpandmlp AT xuhaoran cimpleinstructionandmemorylevelparallelismadslforuncoveringilpandmlp AT rinardmartin cimpleinstructionandmemorylevelparallelismadslforuncoveringilpandmlp AT amarasinghesaman cimpleinstructionandmemorylevelparallelismadslforuncoveringilpandmlp

Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP

Similar Items