RC3: Consistency directed cache coherence for x86-64 with RC extensions

被引:7
|
作者
Elver, Marco [1 ]
Nagarajan, Vijay [1 ]
机构
[1] Univ Edinburgh, Edinburgh EH8 9YL, Midlothian, Scotland
来源
2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT) | 2015年
基金
英国工程与自然科学研究理事会;
关键词
multiprocessors; cache coherence; memory consistency models; MODEL;
D O I
10.1109/PACT.2015.37
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The recent convergence towards programming language based memory consistency models has sparked renewed interest in lazy cache coherence protocols. These protocols exploit synchronization information by enforcing coherence only at synchronization boundaries via self-invalidation. In effect, such protocols do not require sharer tracking which benefits scalability. On the downside, such protocols are only readily applicable to a restricted set of consistency models, such as Release Consistency (RC), which expose synchronization information explicitly. In particular, existing architectures with stricter consistency models (such as x86-64) cannot readily make use of lazy coherence protocols without either: changing the architecture's consistency model to (a variant of) RC at the expense of backwards compatibility; or adapting the protocol to satisfy the stricter consistency model, thereby failing to benefit from synchronization information. We show an approach for the x86-64 architecture, which is a compromise between the two. First, we propose a mechanism to convey synchronization information via a simple ISA extension, while retaining backwards compatibility with legacy codes and older microarchitectures. Second, we propose RC3, a scalable hardware cache coherence protocol for RCtso, the resulting memory consistency model. RC3 does not track sharers, and relies on self-invalidation on acquires. To satisfy RCtso efficiently, the protocol reduces self-invalidations transitively using per-L1 timestamps only. RC3 outperforms a conventional lazy RC protocol by 12%, achieving performance comparable to a MESI directory protocol for RC optimized programs. RC3' s storage overhead per cache line scales logarithmically with increasing core count, and reduces on-chip coherence storage overheads by 45% compared to a related approach specifically targeting TSO.
引用
收藏
页码:292 / 304
页数:13
相关论文
共 2 条
  • [1] Enhancing Randomization Entropy of x86-64 Code while Preserving Semantic Consistency
    Feng Xuewei
    Wang Dongxia
    Lin Zhechao
    Kuang Xiaohui
    Zhao Gang
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1 - 12
  • [2] RAPID PURIFICATION, SITE-DIRECTED MUTAGENESIS, AND INITIAL CHARACTERIZATION OF RECOMBINANT RC3 NEUROGRANIN
    GERENDASY, DD
    HERRON, SR
    WONG, KK
    WATSON, JB
    SUTCLIFFE, JG
    JOURNAL OF MOLECULAR NEUROSCIENCE, 1994, 5 (03) : 133 - 148