Rapid memory-aware selection of hardware accelerators in programmable SoC design

Programmable Systems-on-Chips (SoCs) are expected to incorporate a larger number of application-specific hardware accelerators with tightly integrated memories in order to meet stringent performance-power requirements of embedded systems. As data sharing between the accelerator memories and the proc...

Szczegółowa specyfikacja

Opis bibliograficzny
Główni autorzy: Prakash, Alok, Clarke, Christopher T., Lam, Siew-Kei, Srikanthan, Thambipillai
Kolejni autorzy: School of Computer Science and Engineering
Format: Journal Article
Język:English
Wydane: 2019
Hasła przedmiotowe:
Dostęp online:https://hdl.handle.net/10356/99051
http://hdl.handle.net/10220/48550
Opis
Streszczenie:Programmable Systems-on-Chips (SoCs) are expected to incorporate a larger number of application-specific hardware accelerators with tightly integrated memories in order to meet stringent performance-power requirements of embedded systems. As data sharing between the accelerator memories and the processor is inevitable, it is of paramount importance that the selection of application segments for hardware acceleration must be undertaken such that the communication overhead of data transfers do not impede the advantages of the accelerators. In this paper, we propose a novel memory-aware selection algorithm that is based on an iterative approach to rapidly recommend a set of hardware accelerators that will provide high performance gain under varying area constraint. In order to significantly reduce the algorithm runtime while still guaranteeing near-optimal solutions, we propose a heuristic to estimate the penalties incurred when the processor accesses the accelerator memories. In each iteration of the proposed algorithm, a two-pass method is employed where a set of good hardware accelerator candidates is selected using a greedy approach in the first pass, and a “sliding window” approach is used in the second pass to refine the solution. The two-pass method is iteratively performed on a bounded set of candidate hardware accelerators to limit the search space and to avoid local maxima. In order to validate the benefits of the proposed selection algorithm, an exhaustive search algorithm is also developed. Experimental results using the popular CHStone benchmark suite show that the performance achieved by the accelerators recommended by the proposed algorithm closely matches the performance of the exhaustive algorithm, with close to 99% accuracy, while being orders of magnitude faster.