RLARA: A TSV-Aware Reinforcement Learning Assisted Fault-Tolerant Routing Algorithm for 3D Network-on-Chip

A three-dimensional Network-on-Chip (3D NoC) equips modern multicore processors with good scalability, a small area, and high performance using vertical through-silicon vias (TSV). However, the failure rate of TSV, which is higher than that of horizontal links, causes unpredictable topology variatio...

Full description

Bibliographic Details
Main Authors:	Jiajia Jiao, Ruirui Shen, Lujian Chen, Jin Liu, Dezhi Han
Format:	Article
Language:	English
Published:	MDPI AG 2023-12-01
Series:	Electronics
Subjects:	deadlock mitigation fault tolerance K-means clustering through silicon vias 3D network-on-chip reinforcement learning
Online Access:	https://www.mdpi.com/2079-9292/12/23/4867

_version_	1797400262514573312
author	Jiajia Jiao Ruirui Shen Lujian Chen Jin Liu Dezhi Han
author_facet	Jiajia Jiao Ruirui Shen Lujian Chen Jin Liu Dezhi Han
author_sort	Jiajia Jiao
collection	DOAJ
description	A three-dimensional Network-on-Chip (3D NoC) equips modern multicore processors with good scalability, a small area, and high performance using vertical through-silicon vias (TSV). However, the failure rate of TSV, which is higher than that of horizontal links, causes unpredictable topology variations and requires adaptive routing algorithms to select the available paths dynamically. Most works have aimed at the congestion control for TSV partially 3D NoCs to bypass the TSV reliability issue, while others have focused on the fault tolerance in TSV fully connected 3D NoCs and ignored the performance degradation. In order to adequately improve reliability and performance in TSV fully connected 3D NoC architectures, we propose a TSV-aware Reinforcement Learning Assisted Routing Algorithm (RLARA) for fault-tolerant 3D NoCs. The proposed method can take advantage of both the high throughput of fully connected TSVs and the cost-effective fault tolerance of partially connected TSVs using periodically updated TSV-aware Q table of reinforcement learning. RLARA makes the distributed routing decision with the lowest TSV utilization to avoid the overheating of the TSVs and mitigate the reliability problem. Furthermore, the K-means clustering algorithm is further adopted to compress the routing table of RLARA by exploiting the routing information similarity. To alleviate the inherent deadlock issue of adaptive routing algorithms, the link Q-value from reinforcement learning is combined with the router status based in buffer utilization to predict the congestion and enable RLARA to perform best even under a high traffic load. The experimental results of the ablation study on simulator Garnet 2.0 verify the effectiveness of our proposed RLARA under different fault models, which can perform better than the latest 3D NoC routing algorithms, with up to a 9.04% lower average delay and 8.58% higher successful delivered rate.
first_indexed	2024-03-09T01:52:59Z
format	Article
id	doaj.art-8043e03e42c44436960fbe273f936238
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-09T01:52:59Z
publishDate	2023-12-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-8043e03e42c44436960fbe273f9362382023-12-08T15:14:17ZengMDPI AGElectronics2079-92922023-12-011223486710.3390/electronics12234867RLARA: A TSV-Aware Reinforcement Learning Assisted Fault-Tolerant Routing Algorithm for 3D Network-on-ChipJiajia Jiao0Ruirui Shen1Lujian Chen2Jin Liu3Dezhi Han4College of Information Engineering, Shanghai Maritime University, Shanghai 201306, ChinaCollege of Information Engineering, Shanghai Maritime University, Shanghai 201306, ChinaCollege of Information Engineering, Shanghai Maritime University, Shanghai 201306, ChinaCollege of Information Engineering, Shanghai Maritime University, Shanghai 201306, ChinaCollege of Information Engineering, Shanghai Maritime University, Shanghai 201306, ChinaA three-dimensional Network-on-Chip (3D NoC) equips modern multicore processors with good scalability, a small area, and high performance using vertical through-silicon vias (TSV). However, the failure rate of TSV, which is higher than that of horizontal links, causes unpredictable topology variations and requires adaptive routing algorithms to select the available paths dynamically. Most works have aimed at the congestion control for TSV partially 3D NoCs to bypass the TSV reliability issue, while others have focused on the fault tolerance in TSV fully connected 3D NoCs and ignored the performance degradation. In order to adequately improve reliability and performance in TSV fully connected 3D NoC architectures, we propose a TSV-aware Reinforcement Learning Assisted Routing Algorithm (RLARA) for fault-tolerant 3D NoCs. The proposed method can take advantage of both the high throughput of fully connected TSVs and the cost-effective fault tolerance of partially connected TSVs using periodically updated TSV-aware Q table of reinforcement learning. RLARA makes the distributed routing decision with the lowest TSV utilization to avoid the overheating of the TSVs and mitigate the reliability problem. Furthermore, the K-means clustering algorithm is further adopted to compress the routing table of RLARA by exploiting the routing information similarity. To alleviate the inherent deadlock issue of adaptive routing algorithms, the link Q-value from reinforcement learning is combined with the router status based in buffer utilization to predict the congestion and enable RLARA to perform best even under a high traffic load. The experimental results of the ablation study on simulator Garnet 2.0 verify the effectiveness of our proposed RLARA under different fault models, which can perform better than the latest 3D NoC routing algorithms, with up to a 9.04% lower average delay and 8.58% higher successful delivered rate.https://www.mdpi.com/2079-9292/12/23/4867deadlock mitigationfault toleranceK-means clusteringthrough silicon vias3D network-on-chipreinforcement learning
spellingShingle	Jiajia Jiao Ruirui Shen Lujian Chen Jin Liu Dezhi Han RLARA: A TSV-Aware Reinforcement Learning Assisted Fault-Tolerant Routing Algorithm for 3D Network-on-Chip Electronics deadlock mitigation fault tolerance K-means clustering through silicon vias 3D network-on-chip reinforcement learning
title	RLARA: A TSV-Aware Reinforcement Learning Assisted Fault-Tolerant Routing Algorithm for 3D Network-on-Chip
title_full	RLARA: A TSV-Aware Reinforcement Learning Assisted Fault-Tolerant Routing Algorithm for 3D Network-on-Chip
title_fullStr	RLARA: A TSV-Aware Reinforcement Learning Assisted Fault-Tolerant Routing Algorithm for 3D Network-on-Chip
title_full_unstemmed	RLARA: A TSV-Aware Reinforcement Learning Assisted Fault-Tolerant Routing Algorithm for 3D Network-on-Chip
title_short	RLARA: A TSV-Aware Reinforcement Learning Assisted Fault-Tolerant Routing Algorithm for 3D Network-on-Chip
title_sort	rlara a tsv aware reinforcement learning assisted fault tolerant routing algorithm for 3d network on chip
topic	deadlock mitigation fault tolerance K-means clustering through silicon vias 3D network-on-chip reinforcement learning
url	https://www.mdpi.com/2079-9292/12/23/4867
work_keys_str_mv	AT jiajiajiao rlaraatsvawarereinforcementlearningassistedfaulttolerantroutingalgorithmfor3dnetworkonchip AT ruiruishen rlaraatsvawarereinforcementlearningassistedfaulttolerantroutingalgorithmfor3dnetworkonchip AT lujianchen rlaraatsvawarereinforcementlearningassistedfaulttolerantroutingalgorithmfor3dnetworkonchip AT jinliu rlaraatsvawarereinforcementlearningassistedfaulttolerantroutingalgorithmfor3dnetworkonchip AT dezhihan rlaraatsvawarereinforcementlearningassistedfaulttolerantroutingalgorithmfor3dnetworkonchip

RLARA: A TSV-Aware Reinforcement Learning Assisted Fault-Tolerant Routing Algorithm for 3D Network-on-Chip

Similar Items