Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA
Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-09-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/10/18/2210 |
_version_ | 1797519546584662016 |
---|---|
author | Zhongyuan Zhao Weiguang Sheng Jinchao Li Pengfei Ye Qin Wang Zhigang Mao |
author_facet | Zhongyuan Zhao Weiguang Sheng Jinchao Li Pengfei Ye Qin Wang Zhigang Mao |
author_sort | Zhongyuan Zhao |
collection | DOAJ |
description | Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead. |
first_indexed | 2024-03-10T07:44:19Z |
format | Article |
id | doaj.art-3d6079685a264e08994b395ae9855f77 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T07:44:19Z |
publishDate | 2021-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-3d6079685a264e08994b395ae9855f772023-11-22T12:47:30ZengMDPI AGElectronics2079-92922021-09-011018221010.3390/electronics10182210Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRAZhongyuan Zhao0Weiguang Sheng1Jinchao Li2Pengfei Ye3Qin Wang4Zhigang Mao5Department of Micro/Nano Electronics, Shanghai Jiaotong University, Shanghai 200240, ChinaDepartment of Micro/Nano Electronics, Shanghai Jiaotong University, Shanghai 200240, ChinaHuawei Technologies Shanghai, Shanghai 200299, ChinaIntel Aisa Pacific Development, Shanghai 200241, ChinaDepartment of Micro/Nano Electronics, Shanghai Jiaotong University, Shanghai 200240, ChinaDepartment of Micro/Nano Electronics, Shanghai Jiaotong University, Shanghai 200240, ChinaModulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead.https://www.mdpi.com/2079-9292/10/18/2210CGRAsimilarity-awarecontext reductionmodulo schedulingsimulated annealing |
spellingShingle | Zhongyuan Zhao Weiguang Sheng Jinchao Li Pengfei Ye Qin Wang Zhigang Mao Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA Electronics CGRA similarity-aware context reduction modulo scheduling simulated annealing |
title | Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA |
title_full | Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA |
title_fullStr | Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA |
title_full_unstemmed | Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA |
title_short | Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA |
title_sort | similarity aware architecture compiler co designed context reduction framework for modulo scheduled cgra |
topic | CGRA similarity-aware context reduction modulo scheduling simulated annealing |
url | https://www.mdpi.com/2079-9292/10/18/2210 |
work_keys_str_mv | AT zhongyuanzhao similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra AT weiguangsheng similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra AT jinchaoli similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra AT pengfeiye similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra AT qinwang similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra AT zhigangmao similarityawarearchitecturecompilercodesignedcontextreductionframeworkformoduloscheduledcgra |