The Locality-First Strategy for Developing Efficient Multicore Algorithm

To scale applications on multicores up to bigger problems, software systems must be optimized both for parallelism to take full advantage of the multiple cores and for locality to exploit the memory system for cache-friendliness. Parallelization alone does not suffice to reach peak performance due t...

Full description

Bibliographic Details
Main Author: Xu, Helen Jiang
Other Authors: Leiserson, Charles E.
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/143200
https://orcid.org/0000-0003-2232-3305
_version_ 1811077386483007488
author Xu, Helen Jiang
author2 Leiserson, Charles E.
author_facet Leiserson, Charles E.
Xu, Helen Jiang
author_sort Xu, Helen Jiang
collection MIT
description To scale applications on multicores up to bigger problems, software systems must be optimized both for parallelism to take full advantage of the multiple cores and for locality to exploit the memory system for cache-friendliness. Parallelization alone does not suffice to reach peak performance due to the processor-memory gap: the increasing divergence of processor and memory speeds. Locality and parallelism are difficult to optimize for independently — and even more challenging to combine — because they tend to conflict with each other. I advocate that algorithm developers employ a locality-first strategy for developing efficient parallel and cache-friendly algorithms for multicores. That is, they should first understand and exploit locality as much as possible before introducing parallelism. I argue that an algorithm developer can achieve high-performing code more easily with the locality-first strategy than with either a parallelism-first strategy or a strategy of trying to optimize both simultaneously. I present ten artifacts that leverage the locality-first strategy to create fast multicore algorithms that are simple to describe and implement. For example, locality-first data structure design in graph processing achieves about 2× speedup over the state of the art. Additionally, I prove mathematically that multicore cache-replacement algorithms that take advantage of locality outperform all other online algorithms. The other eight artifacts make similar contributions in their respective domains. Together, these artifacts demonstrate that the locality-first strategy provides an effective roadmap for algorithm developers to design and implement theoretically and practically efficient multicore code.
first_indexed 2024-09-23T10:42:11Z
format Thesis
id mit-1721.1/143200
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T10:42:11Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1432002022-06-16T03:40:32Z The Locality-First Strategy for Developing Efficient Multicore Algorithm Xu, Helen Jiang Leiserson, Charles E. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science To scale applications on multicores up to bigger problems, software systems must be optimized both for parallelism to take full advantage of the multiple cores and for locality to exploit the memory system for cache-friendliness. Parallelization alone does not suffice to reach peak performance due to the processor-memory gap: the increasing divergence of processor and memory speeds. Locality and parallelism are difficult to optimize for independently — and even more challenging to combine — because they tend to conflict with each other. I advocate that algorithm developers employ a locality-first strategy for developing efficient parallel and cache-friendly algorithms for multicores. That is, they should first understand and exploit locality as much as possible before introducing parallelism. I argue that an algorithm developer can achieve high-performing code more easily with the locality-first strategy than with either a parallelism-first strategy or a strategy of trying to optimize both simultaneously. I present ten artifacts that leverage the locality-first strategy to create fast multicore algorithms that are simple to describe and implement. For example, locality-first data structure design in graph processing achieves about 2× speedup over the state of the art. Additionally, I prove mathematically that multicore cache-replacement algorithms that take advantage of locality outperform all other online algorithms. The other eight artifacts make similar contributions in their respective domains. Together, these artifacts demonstrate that the locality-first strategy provides an effective roadmap for algorithm developers to design and implement theoretically and practically efficient multicore code. Ph.D. 2022-06-15T13:03:04Z 2022-06-15T13:03:04Z 2022-02 2022-03-04T20:47:45.889Z Thesis https://hdl.handle.net/1721.1/143200 https://orcid.org/0000-0003-2232-3305 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Xu, Helen Jiang
The Locality-First Strategy for Developing Efficient Multicore Algorithm
title The Locality-First Strategy for Developing Efficient Multicore Algorithm
title_full The Locality-First Strategy for Developing Efficient Multicore Algorithm
title_fullStr The Locality-First Strategy for Developing Efficient Multicore Algorithm
title_full_unstemmed The Locality-First Strategy for Developing Efficient Multicore Algorithm
title_short The Locality-First Strategy for Developing Efficient Multicore Algorithm
title_sort locality first strategy for developing efficient multicore algorithm
url https://hdl.handle.net/1721.1/143200
https://orcid.org/0000-0003-2232-3305
work_keys_str_mv AT xuhelenjiang thelocalityfirststrategyfordevelopingefficientmulticorealgorithm
AT xuhelenjiang localityfirststrategyfordevelopingefficientmulticorealgorithm