摘要

In chip multiprocessors (CMPs), maintaining cache coherence can account for a major performance overhead. Write-invalidate protocols adapted by most CMPs generate high cache-to-cache misses under producer-consumer sharing patterns. Accordingly, this paper presents three cache coherence mechanisms optimized for CMPs. First, to reduce coherence misses observed in write-invalidate-based protocols, we propose a dynamic write-update mechanism augmented on top of a write-invalidate protocol. This mechanism is specifically triggered at the detection of a producer-consumer sharing pattern. Second, we extend this adaptive protocol with a bandwidth-adaptive mechanism to eliminate performance degradation from write-updates under limited bandwidth. Finally, proximity-aware mechanism is proposed to extend the base adaptive protocol with latency-based optimizations. Experimental analysis is conducted on a set of scientific applications from the SPLASH-2 and NAS parallel benchmark suites. The proposed mechanisms were shown to reduce coherence misses by up to 48% and in return speed up application performance up to 30%. Bandwidth-adaptive mechanism is proven to perform well under varying levels of available bandwidth. Results from our proposed proximity-aware extension demonstrated up to 6% performance gain over the base adaptive protocol for 64-core tiled CMP runs. In addition, the analytical model provided good estimates for performance gains from our adaptive protocols.

  • 出版日期2015-2