摘要

The number of cores of contemporary processors is constantly increasing and thus continues to deliver ever higher peak performance (following Moore's transistor law). Yet high core counts present a challenge to hardware and software alike. Following this trend, the network-on-chip (NoC) topology has changed from buses over rings and fully connected meshes to 2D meshes. This work contributes NoCMsg, a low-level message-passing abstraction over No Cs, which is specifically designed for large core counts in 2D meshes. NoCMsg ensures deadlock-free messaging for wormhole Manhattan-path routing over the NoC via a polling-based message abstraction and non-flow-controlled communication for selective communication patterns. Experimental results on the TilePro hardware platform show that NoCMsg can significantly reduce communication times by up to 86% for single packet messages and up to 40% for larger messages compared to other NoC-based message approaches. On the TilePro platform, NoCMsg outperforms shared memory abstractions by up to 93% as core counts and interprocess communication increase. Results for fully pipelined double-precision numerical codes show speedups of up to 64% for message passing over shared memory at 32 cores. Overall, we observe that shared memory scales up to about 16 cores on this platform, whereas message passing performs well beyond that threshold. These results generalize to similar NoC-based platforms.

  • 出版日期2015-4