摘要

Aggressive scaling of the CMOS process technology allows the fabrication of highly integrated chips, and enables the design of the network-on-chip (NoC). However, it also leads to widespread reliability problems. A reliable NoC system must operate normally even in the face of a lot of transistor failures. Aiming towards permanent faults on communication links, we introduce a fault-tolerant MPI-like communication protocol. It detects the link failure if there exist unresponsive requests and automatically starts the new path exploration. The region flooding algorithm is proposed to search for a fault-free path and reroute packets to avoid system stalls. The experimental result shows our approach significantly reduces the latency compared with the basic flooding algorithm. The maximum latency reduction is 25% under the bit complement traffic pattern. Also, it brings only 2% fault tolerance loss.