Hardware implementation of a power efficient CGRA with single-cycle multi-hop datapaths

Coarse-grained reconfigurable architectures (CGRAs) are computing architectures that provide word-level reconfigurability. CGRAs can achieve high throughput and high power efficiency, while maintaining post-fabrication computing flexibil- ity. In this dissertation, a 167-GOPS/W CGRA design with sing...

Full description

Bibliographic Details
Main Author: Su, Lingzhi
Other Authors: Goh Wang Ling
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/158595
Description
Summary:Coarse-grained reconfigurable architectures (CGRAs) are computing architectures that provide word-level reconfigurability. CGRAs can achieve high throughput and high power efficiency, while maintaining post-fabrication computing flexibil- ity. In this dissertation, a 167-GOPS/W CGRA design with single-cycle multi- hop datapaths, named PACE, is proposed. The hardware architectures including processing element (PE), algorithm logic unit (ALU), routers, and on-chip pe- ripherals are presented. ALU input gating technique and no operation (NOP) clock gating technique are integrated with PEs to reduce power consumption of the ALU and PE module by 68.66% and 39.11%, respectively. In terms of hardware implementation, memory timing problem and combina- tional logic loop issue are discussed in this dissertation. Memory interface cir- cuit with buffer registers and inverted clock is introduced to static random ac- cess memory (SRAM) devices to achieve high speed and single-cycle latency. Constraints on combinational logic loop for flatten and hierarchy synthesis flows are respectively presented, to provide a detailed static timing analysis on the by- pass datapaths. Demonstrations based on FPGA and computer system are also provided, with successful running in applications such as the array add, general matrix multiplication (GEMM), and so on. A PACE chip is also successfully implemented on silicon at 100-MHz frequency.