RC3: Consistency directed cache coherence for x86-64 with RC extensions

被引：7

作者：

Elver, Marco ^{[1
]}

Nagarajan, Vijay ^{[1
]}

机构：

[1] Univ Edinburgh, Edinburgh EH8 9YL, Midlothian, Scotland

来源：

2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT) | 2015年

基金：

英国工程与自然科学研究理事会;

关键词：

multiprocessors; cache coherence; memory consistency models; MODEL;

D O I：

10.1109/PACT.2015.37

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The recent convergence towards programming language based memory consistency models has sparked renewed interest in lazy cache coherence protocols. These protocols exploit synchronization information by enforcing coherence only at synchronization boundaries via self-invalidation. In effect, such protocols do not require sharer tracking which benefits scalability. On the downside, such protocols are only readily applicable to a restricted set of consistency models, such as Release Consistency (RC), which expose synchronization information explicitly. In particular, existing architectures with stricter consistency models (such as x86-64) cannot readily make use of lazy coherence protocols without either: changing the architecture's consistency model to (a variant of) RC at the expense of backwards compatibility; or adapting the protocol to satisfy the stricter consistency model, thereby failing to benefit from synchronization information. We show an approach for the x86-64 architecture, which is a compromise between the two. First, we propose a mechanism to convey synchronization information via a simple ISA extension, while retaining backwards compatibility with legacy codes and older microarchitectures. Second, we propose RC3, a scalable hardware cache coherence protocol for RCtso, the resulting memory consistency model. RC3 does not track sharers, and relies on self-invalidation on acquires. To satisfy RCtso efficiently, the protocol reduces self-invalidations transitively using per-L1 timestamps only. RC3 outperforms a conventional lazy RC protocol by 12%, achieving performance comparable to a MESI directory protocol for RC optimized programs. RC3' s storage overhead per cache line scales logarithmically with increasing core count, and reduces on-chip coherence storage overheads by 45% compared to a related approach specifically targeting TSO.

引用

页码：292 / 304

页数：13

共 2 条

[1] Enhancing Randomization Entropy of x86-64 Code while Preserving Semantic Consistency
Feng Xuewei
Wang Dongxia
Lin Zhechao
Kuang Xiaohui
Zhao Gang
2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1 - 12
[2] RAPID PURIFICATION, SITE-DIRECTED MUTAGENESIS, AND INITIAL CHARACTERIZATION OF RECOMBINANT RC3 NEUROGRANIN
GERENDASY, DD
HERRON, SR
WONG, KK
WATSON, JB
SUTCLIFFE, JG
JOURNAL OF MOLECULAR NEUROSCIENCE, 1994, 5 (03) : 133 - 148

← 1 →