Summary: | We will examine state-of-the-art approaches for sparse-dense matrix multiplication (SpMDM), with a focused application on graph machine learning workloads, such as graph neural networks (GNNs), though this work is general enough such that it should apply to any application tailored for running matrix multiplication workloads that cannot fit in memory. Specifically, we will conduct a thorough and in-depth analysis on the various optimization strategies, including sparse matrix formats, tiling, load balancing, and data locality, and investigate how they affect performance. Based on the performance study, we will design and implement an out-of-core framework that supports massive graph datasets which can not fit into memory. We foresee challenges in mitigating the overhead of accessing external storage, as well as finding a way to balance performance with optimization of CPU/GPU memory usage. We will compare our out-of-core solution with state-of-the-art in-memory solutions as well as distributed solutions, and analyze the algorithmic complexity and overall overhead involved in our implementation.
|