A Practical Data Classification Framework for Scalable and High Performance Chip-Multiprocessors

Li Yong<sup>*</sup>; Melhem Rami; Jones Alex K

doi:10.1109/TC.2013.161

摘要

State-of-the-art chip multiprocessor (CMP) proposals emphasize general optimizations designed to deliver computing power for many types of applications. Potentially, significant performance improvements that leverage application-specific characteristics such as data access behavior are missed by this approach. In this paper, we demonstrate how scalable and high-performance parallel systems can be built by classifying data accesses into different categories and treating them differently. We develop a novel compiler-based approach to speculatively detect a data classification termed practically private, which we demonstrate is ubiquitous in a wide range of parallel applications. Leveraging this classification provides efficient solutions to mitigate data access latency and coherence overhead in today's many-core architectures. While the proposed data classification scheme can be applied to many micro-architectural constructs including the TLB, coherence directory, and interconnect, we demonstrate its potential through an efficient cache coherence design. Specifically, we show that the compiler-assisted mechanism reduces an average of 46% coherence traffic and achieves up to 12%, 8%, and 5% performance improvement over shared, private, and state-of-the-art NUCA-based caching, respectively, depending on scenarios.

出版日期2014-12

全文

访问全文

收藏分享被引(4) 浏览

更新时间：2021-04-21 04:07

A Practical Data Classification Framework for Scalable and High Performance Chip-Multiprocessors

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友