Optimizing Vector Instruction Selection for Digital Signal Processing

Digital signal processing applications benefit from fast implementations of vectorized inner kernels. Existing compilers rely on brittle pattern-matching or search-based methods with poor scalability for vector instruction selection – techniques which are limited by a reliance on the syntax of the i...

Full description

Bibliographic Details
Main Author: Root, Alexander James
Other Authors: Ragan-Kelley, Jonathan
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/144935
Description
Summary:Digital signal processing applications benefit from fast implementations of vectorized inner kernels. Existing compilers rely on brittle pattern-matching or search-based methods with poor scalability for vector instruction selection – techniques which are limited by a reliance on the syntax of the input code. These techniques struggle to utilize the efficient fused instructions that exist on modern hardware. This thesis extends the Rake synthesis-based optimizing compiler to target the ARM Neon ISA via the design of a high-level intermediate representation for vector computation, with each component of the IR unifying multiple concrete instructions for the target ISA. This technique relies on the semantics of the input code, rather than the syntax alone, allowing for powerful equivalent rewrites that existing compilers are currently incapable of performing. On 11 real-world benchmarks, our system achieves up to a 65% faster runtime (geometric mean of 12%) than the Halide and LLVM vector instruction selectors that have been developed over the past decade.