Summary: | This thesis investigates the performance of reducer hyperobjects, a feature of the Cilk task-parallel runtime system that enables concurrent associative updates to nonlocal variables. Reducers are more performant than more traditional methods of enabling concurrent updates, such as locking and atomic updates. Unfortunately, existing reducer implementations can suffer a cost of up to 10 times that of a serial update, depending on the benchmark.
This overhead incurred by reducers can be decreased by three approaches: runtime data structures, compiler-runtime integration, and compiler optimization. When these approaches are used to performance-engineer the OpenCilk runtime system's reducers, the overall performance of a benchmark suite designed to stress-test reducers sees a geometric average improvement of 25.74% and a maximum improvement of 88.32%.
This thesis also investigates ``commutative'' reducers, a proposed reducer optimization premised on restricted semantics, but finds they yield a performance degredation while being linguistically unwieldy. The examination of commutative reducers leads to an empirical investigation of the scalability of traditional reducers, finding a quadratic upper bound on their performance to be loose.
Parts of this thesis represent joint work with Charles E. Leiserson, Tim Kaler, William S. Moses, Qi Qi, and Tao B. Schardl of MIT and I-Ting Angelina Lee of Washington University in St .Louis.
|