Implementation tradeoffs in the design of flexible transactional memory support

被引:5
作者
Shriraman, Arrvindh [1 ]
Dwarkadas, Sandhya [1 ]
Scott, Michael L. [1 ]
机构
[1] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA
关键词
Synchronization; Atomicity; Transactional memory; Version management; Conflict detection; FlexTM;
D O I
10.1016/j.jpdc.2010.03.006
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present FlexTM (FLEXible Transactional Memory), a high performance TM framework that allows software to determine when (eagerly, lazily, or in a mixed fashion) and how to manage conflicts, while employing hardware to manage transactional state and to track conflicts. FlexTM coordinates four decoupled hardware mechanisms: read and write signatures, which summarize per-thread access sets: per-thread conflict summary tables (CSTs), which identify the processors with which conflicts have occurred: Programmable Data isolation, which buffers speculative updates in the local cache and uses an overflow table to handle unbounded updates: and Alert-On-Update, which notifies a thread immediately when a specified location is written by another processor. The CSTs enable an STM-inspired commit protocol that manages conflicts in a decentralized manner (no global arbitration) and allows parallel commits. We explore the implementation tradeoffs associated with FlexTM's versioning and conflict detection mechanisms. Our results demonstrate that FlexTM exhibits similar to 5x speedup over high-quality software TMs, and similar to 1.8x speedup over hybrid TMs (those with software always in the loop), with no loss in policy flexibility. We find that the distributed commit protocol improves performance by 2%-14% over an aggressive centralized arbiter mechanism that also allows parallel commits. Finally, we compare the use of an aggressive hardware controller (as used in the base FlexTM design) to manage and to access any speculative transaction state overflowed from the cache, to a hardware-software approach dubbed FlexTM-S (FlexTM-Streamlined), where software manages the overflow region but uses a metadata cache to accelerate speculative data replacements and their subsequent accesses. We demonstrate that FlexTM-S's performance is within 10% of FlexTM's despite its substantially simpler virtualization mechanism. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:1068 / 1084
页数:17
相关论文
共 48 条
[1]   Unbounded transactional memory [J].
Ananian, CS ;
Asanovic, K ;
Kuszmaul, BC ;
Leiserson, CE ;
Lie, S .
11TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2005, :316-327
[2]  
BAUGH L, 2008, P 35 INT S COMP ARCH
[3]   SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS [J].
BLOOM, BH .
COMMUNICATIONS OF THE ACM, 1970, 13 (07) :422-&
[4]  
BLUNDELL C, 2006, IEEE COMPUTER ARCHIT, V5
[5]  
BOBBA J, 2007, P 34 INT S COMP ARCH, P32
[6]  
BOBBA J, 2008, P 35 INT S COMP ARCH
[7]  
CAO C, 2007, P 34 INT S COMP ARCH
[8]  
Ceze L., 2007, P 34 INT S COMP ARCH
[9]  
Ceze L., 2006, P 33 INT S COMP ARCH
[10]  
Chafi H., 2007, P 13 INT S HIGH PERF