The Locality-First Strategy for Developing Efficient Multicore Algorithm
To scale applications on multicores up to bigger problems, software systems must be optimized both for parallelism to take full advantage of the multiple cores and for locality to exploit the memory system for cache-friendliness. Parallelization alone does not suffice to reach peak performance due t...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/143200 https://orcid.org/0000-0003-2232-3305 |
_version_ | 1811077386483007488 |
---|---|
author | Xu, Helen Jiang |
author2 | Leiserson, Charles E. |
author_facet | Leiserson, Charles E. Xu, Helen Jiang |
author_sort | Xu, Helen Jiang |
collection | MIT |
description | To scale applications on multicores up to bigger problems, software systems must be optimized both for parallelism to take full advantage of the multiple cores and for locality to exploit the memory system for cache-friendliness. Parallelization alone does not suffice to reach peak performance due to the processor-memory gap: the increasing divergence of processor and memory speeds. Locality and parallelism are difficult to optimize for independently — and even more challenging to combine — because they tend to conflict with each other.
I advocate that algorithm developers employ a locality-first strategy for developing efficient parallel and cache-friendly algorithms for multicores. That is, they should first understand and exploit locality as much as possible before introducing parallelism. I argue that an algorithm developer can achieve high-performing code more easily with the locality-first strategy than with either a parallelism-first strategy or a strategy of trying to optimize both simultaneously.
I present ten artifacts that leverage the locality-first strategy to create fast multicore algorithms that are simple to describe and implement. For example, locality-first data structure design in graph processing achieves about 2× speedup over the state of the art. Additionally, I prove mathematically that multicore cache-replacement algorithms that take advantage of locality outperform all other online algorithms. The other eight artifacts make similar contributions in their respective domains. Together, these artifacts demonstrate that the locality-first strategy provides an effective roadmap for algorithm developers to design and implement theoretically and practically efficient multicore code. |
first_indexed | 2024-09-23T10:42:11Z |
format | Thesis |
id | mit-1721.1/143200 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T10:42:11Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1432002022-06-16T03:40:32Z The Locality-First Strategy for Developing Efficient Multicore Algorithm Xu, Helen Jiang Leiserson, Charles E. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science To scale applications on multicores up to bigger problems, software systems must be optimized both for parallelism to take full advantage of the multiple cores and for locality to exploit the memory system for cache-friendliness. Parallelization alone does not suffice to reach peak performance due to the processor-memory gap: the increasing divergence of processor and memory speeds. Locality and parallelism are difficult to optimize for independently — and even more challenging to combine — because they tend to conflict with each other. I advocate that algorithm developers employ a locality-first strategy for developing efficient parallel and cache-friendly algorithms for multicores. That is, they should first understand and exploit locality as much as possible before introducing parallelism. I argue that an algorithm developer can achieve high-performing code more easily with the locality-first strategy than with either a parallelism-first strategy or a strategy of trying to optimize both simultaneously. I present ten artifacts that leverage the locality-first strategy to create fast multicore algorithms that are simple to describe and implement. For example, locality-first data structure design in graph processing achieves about 2× speedup over the state of the art. Additionally, I prove mathematically that multicore cache-replacement algorithms that take advantage of locality outperform all other online algorithms. The other eight artifacts make similar contributions in their respective domains. Together, these artifacts demonstrate that the locality-first strategy provides an effective roadmap for algorithm developers to design and implement theoretically and practically efficient multicore code. Ph.D. 2022-06-15T13:03:04Z 2022-06-15T13:03:04Z 2022-02 2022-03-04T20:47:45.889Z Thesis https://hdl.handle.net/1721.1/143200 https://orcid.org/0000-0003-2232-3305 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Xu, Helen Jiang The Locality-First Strategy for Developing Efficient Multicore Algorithm |
title | The Locality-First Strategy for Developing Efficient Multicore Algorithm |
title_full | The Locality-First Strategy for Developing Efficient Multicore Algorithm |
title_fullStr | The Locality-First Strategy for Developing Efficient Multicore Algorithm |
title_full_unstemmed | The Locality-First Strategy for Developing Efficient Multicore Algorithm |
title_short | The Locality-First Strategy for Developing Efficient Multicore Algorithm |
title_sort | locality first strategy for developing efficient multicore algorithm |
url | https://hdl.handle.net/1721.1/143200 https://orcid.org/0000-0003-2232-3305 |
work_keys_str_mv | AT xuhelenjiang thelocalityfirststrategyfordevelopingefficientmulticorealgorithm AT xuhelenjiang localityfirststrategyfordevelopingefficientmulticorealgorithm |