A Stream Compiler for Communication-Exposed Architectures

With the increasing miniturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, iWar...

Full description

Bibliographic Details
Main Authors: Gordon, Michael, Thies, William, Karczmarek, Michael, Wong, Jeremy, Hoffmann, Henry, Maze, David, Amarasinghe, Saman
Published: 2023
Online Access:https://hdl.handle.net/1721.1/149316
Description
Summary:With the increasing miniturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, iWarp, SmartMemories). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wire-exposed architectures. In this paper, we describe our compiler for StreamIt: a high-level, architecture-independent language for streaming applications. We focus on our backend for the Raw processor. Though StreamIt exposes the parallelism and communication patterns of stream programs, much analysis is needed to adapt a stream program to a parallel stream processor. We describe fission and fusion transformations that can be used to adjust the granularity of a stream graph, a layout algorithm for mapping a stream graph to a given network topology, and a scheduling algorithm for generating a fine-grained static communication pattern for each computational element. We have implemented a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing optimizations. Using the cycle-accurate Raw simulator, we demonstrate that these optimizations can improve performance by up to 145%. We consider this work to be a first step towards a portable programming model for communication-exposed architectures.