Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous Machines

In a parallel and distributed application, a mapping is a selection of a processor for each computation or task and memories for the data collections that each task accesses. Finding high-performance mappings is challenging, particularly on heterogeneous hardware with multiple choices for processors...

Full description

Bibliographic Details
Main Authors: S. F. X. Teixeira, Thiago, Henzinger, Alexandra, Yadav, Rohan, Aiken, Alex
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: ACM|The International Conference for High Performance Computing, Networking, Storage and Analysis 2023
Online Access:https://hdl.handle.net/1721.1/153148
Description
Summary:In a parallel and distributed application, a mapping is a selection of a processor for each computation or task and memories for the data collections that each task accesses. Finding high-performance mappings is challenging, particularly on heterogeneous hardware with multiple choices for processors and memories. We show that fast mappings are sensitive to the machine, application, and input. Porting to a new machine, modifying the application, or using a different input size may necessitate re-tuning the mapping to maintain the best possible performance. We present AutoMap, a system that automatically tunes the mapping to the hardware used and finds fast mappings without user intervention or code modification. In contrast, hand-written mappings often require days of experimentation. AutoMap utilizes a novel constrained coordinate-wise descent search algorithm that balances the trade-off between running computations quickly and minimizing data movement. AutoMap discovers mappings up to 2.41× faster than custom, hand-written mappers.