Packing Multiple Types of Cores for Energy-Optimized Heterogeneous Hardware-Software Co-Design of Moldable Streaming Computations

For fixed-application scenarios in embedded soft-realtime computing, the ideal (w.r.t. energy consumption) heterogeneous multi-core CPU design within given chip dimensions can be configured by composing it from given pre-layouted, rectangular chip submodules for each of a number <inline-formula&g...

Full description

Bibliographic Details
Main Authors: Sebastian Litzinger, Jorg Keller, Christoph Kessler
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10050873/
Description
Summary:For fixed-application scenarios in embedded soft-realtime computing, the ideal (w.r.t. energy consumption) heterogeneous multi-core CPU design within given chip dimensions can be configured by composing it from given pre-layouted, rectangular chip submodules for each of a number <inline-formula> <tex-math notation="LaTeX">$K&gt;1$ </tex-math></inline-formula> of core types, where <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula> in practice is a small constant. For example, <inline-formula> <tex-math notation="LaTeX">$K=2$ </tex-math></inline-formula> in traditional ARM big.LITTLE designs. Nevertheless, even better solutions might be achieved for <inline-formula> <tex-math notation="LaTeX">$K&gt;2$ </tex-math></inline-formula>, and many feasible combinations can exist. For this purpose, we investigate finding all combinations of instances of <inline-formula> <tex-math notation="LaTeX">$K&gt;1$ </tex-math></inline-formula> different types of given axis-parallel rectangles that can be packed within a given fixed-size 2D rectangle, and we propose two new packing heuristics: the corner heuristic for <inline-formula> <tex-math notation="LaTeX">$K\leq 4$ </tex-math></inline-formula>, and the onion heuristic for larger <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>. Both heuristics strive to pack cores of the same type close together, to simplify implementation of on-chip bus and shared cache structures. The core combinations can be used in co-optimizing chip configuration, task mapping and scheduling for stream processing applications. We evaluate the corner heuristic for a number of different types of ARM softcores and chip dimensions, and show that it outperforms strip packing techniques from the literature and yields similar results to an advanced rectpack heuristic allowing rotation, though these do not try to pack similar cores closely.
ISSN:2169-3536