Eliminating synchronization bottlenecks using adaptive replication

被引:3
作者
Rinard, MC
Diniz, PC
机构
[1] MIT, Comp Sci Lab, Cambridge, MA 02139 USA
[2] Univ So Calif, ISI, Marina Del Rey, CA 90202 USA
来源
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS | 2003年 / 25卷 / 03期
关键词
algorithms; experimentation; performance; atomic operations; commutativity analysis; parallel computing; parallelizing compilers; replication; synchronization;
D O I
10.1145/641909.641911
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This article presents a new technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on objects. Synchronization bottlenecks occur when multiple threads attempt to concurrently update the same object. It is often possible to eliminate synchronization bottlenecks by replicating objects. Each thread can then update its own local replica without synchronization and without interacting with other threads. When the computation needs to access the original object, it combines the replicas to produce the correct values in the original object. One potential problem is that eagerly replicating all objects may lead to performance degradation and excessive memory consumption. Adaptive replication eliminates unnecessary replication by dynamically detecting contention at each object to find and replicate only those objects that would otherwise cause synchronization bottlenecks. We have implemented adaptive replication in the context of a parallelizing compiler for a subset of C++. Given an unannotated sequential program written in C++, the compiler automatically extracts the concurrency, determines when it is legal to apply adaptive replication, and generates parallel code that uses adaptive replication to efficiently eliminate synchronization bottlenecks. In addition to automatic parallelization and adaptive replication, our compiler also implements a lock coarsening transformation that increases the granularity at which the computation locks objects. The advantage is a reduction in the frequency with which the computation acquires and releases locks; the potential disadvantage is the introduction of new synchronization bottlenecks caused by increases in the sizes of the critical sections. Because the adaptive replication transformation takes place at lock acquisition sites, there is a synergistic interaction between lock coarsening and adaptive replication. Lock coarsening drives down the overhead of using adaptive replication, and adaptive replication eliminates synchronization bottlenecks associated with the overaggressive use of lock coarsening. Our experimental results show that, for our set of benchmark programs, the combination of lock coarsening and adaptive replication can eliminate synchronization bottlenecks and significantly reduce the synchronization and replication overhead as compared to versions that use none or only one of the transformations.
引用
收藏
页码:316 / 359
页数:44
相关论文
共 46 条
[1]  
ALDRICH J, 1999, P 6 INT STAT AN S
[2]   TreadMarks: Shared memory computing on networks of workstations [J].
Amza, C ;
Cox, AL ;
Dwarkadas, S ;
Keleher, P ;
Lu, HH ;
Rajamony, R ;
Yu, WM ;
Zwaenepoel, W .
COMPUTER, 1996, 29 (02) :18-&
[3]  
[Anonymous], 1996, P 23RD ACM SIGPLAN S
[4]  
Arnold Ken., 1996, The Java Programming Language
[5]   A HIERARCHICAL O(N-LOG-N) FORCE-CALCULATION ALGORITHM [J].
BARNES, J ;
HUT, P .
NATURE, 1986, 324 (6096) :446-449
[6]  
BERRY M, 1989, 827 ICASE U ILL URB
[7]  
BRINCHHANSEN P, 1975, IEEE T SOFTWARE ENG, V1, P199
[8]  
CALLAHAN D, 1991, P 4 WORKSH LANG COMP, P169
[9]  
CHASE DR, 1990, P SIGPLAN 90 C PROGR, P296
[10]   Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback [J].
Diniz, PC .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1999, 17 (02) :89-132