Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA

Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for...

Full description

Bibliographic Details
Main Authors: Zhongyuan Zhao, Weiguang Sheng, Jinchao Li, Pengfei Ye, Qin Wang, Zhigang Mao
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/10/18/2210
_version_ 1797519546584662016
author Zhongyuan Zhao
Weiguang Sheng
Jinchao Li
Pengfei Ye
Qin Wang
Zhigang Mao
author_facet Zhongyuan Zhao
Weiguang Sheng
Jinchao Li
Pengfei Ye
Qin Wang
Zhigang Mao
author_sort Zhongyuan Zhao
collection DOAJ
description Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead.
first_indexed 2024-03-10T07:44:19Z
format Article
id doaj.art-3d6079685a264e08994b395ae9855f77
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T07:44:19Z
publishDate 2021-09-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-3d6079685a264e08994b395ae9855f772023-11-22T12:47:30ZengMDPI AGElectronics2079-92922021-09-011018221010.3390/electronics10182210Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRAZhongyuan Zhao0Weiguang Sheng1Jinchao Li2Pengfei Ye3Qin Wang4Zhigang Mao5Department of Micro/Nano Electronics, Shanghai Jiaotong University, Shanghai 200240, ChinaDepartment of Micro/Nano Electronics, Shanghai Jiaotong University, Shanghai 200240, ChinaHuawei Technologies Shanghai, Shanghai 200299, ChinaIntel Aisa Pacific Development, Shanghai 200241, ChinaDepartment of Micro/Nano Electronics, Shanghai Jiaotong University, Shanghai 200240, ChinaDepartment of Micro/Nano Electronics, Shanghai Jiaotong University, Shanghai 200240, ChinaModulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead.https://www.mdpi.com/2079-9292/10/18/2210CGRAsimilarity-awarecontext reductionmodulo schedulingsimulated annealing
spellingShingle Zhongyuan Zhao
Weiguang Sheng
Jinchao Li
Pengfei Ye
Qin Wang
Zhigang Mao
Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA
Electronics
CGRA
similarity-aware
context reduction
modulo scheduling
simulated annealing
title Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA
title_full Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA
title_fullStr Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA
title_full_unstemmed Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA
title_short Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA
title_sort similarity aware architecture compiler co designed context reduction framework for modulo scheduled cgra
topic CGRA
similarity-aware
context reduction
modulo scheduling
simulated annealing
url https://www.mdpi.com/2079-9292/10/18/2210
work_keys_str_mv AT zhongyuanzhao similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra
AT weiguangsheng similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra
AT jinchaoli similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra
AT pengfeiye similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra
AT qinwang similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra
AT zhigangmao similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra