Design space exploration of a software speculative parallelization scheme

被引:30
作者
Cintra, M
Llanos, DR
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh EH9 3JZ, Midlothian, Scotland
[2] Univ Valladolid, Dept Informat, Edificio Tecnol Informat, E-47011 Valladolid, Spain
基金
英国工程与自然科学研究理事会;
关键词
speculative parallelization; thread-level speculation; parallel architectures;
D O I
10.1109/TPDS.2005.69
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With speculative parallelization, code sections that cannot be fully analyzed by the compiler are optimistically executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and/or memory system. Software schemes require no changes to the hardware of existing shared-memory systems, but can suffer from significant overheads involved with the speculative execution. In fact, the performance of software schemes is highly dependent on application characteristics, the design and implementation of the scheme, and the system configuration and size. This paper explores the design space of a recently proposed software speculative parallelization scheme. In the process, we gain insight into the most beneficial features of software schemes for speculative parallelization, as well as the most influential application characteristics. For instance, experimental results show that, contrary to intuition, checking for data dependence violations on every speculative store, as opposed to at commit time, leads to little performance degradation in the worst case and to significantly better performance with large configurations. Also, scheduling policies based on windows can perform very close to fully dynamic policies with a fraction of the memory overhead. Finally, experimental results show consistent speedups in the execution of loops that cannot be parallelized at compile time, both with and without RAW data dependences, for 4 to 32 processors.
引用
收藏
页码:562 / 576
页数:15
相关论文
共 29 条
[1]   Shared memory consistency models: A tutorial [J].
Adve, SV ;
Gharachorloo, K .
COMPUTER, 1996, 29 (12) :66-&
[2]  
[Anonymous], 1994, POWERPC ARCHITECTURE
[3]  
[Anonymous], 1993, P S COMP ARCH ISCA
[4]  
ASLOT V, 2001, P INT WORKSH OPENMP, P1
[5]   THE PERFECT-CLUB BENCHMARKS - EFFECTIVE PERFORMANCE EVALUATION OF SUPERCOMPUTERS [J].
BERRY, M ;
CHEN, D ;
KOSS, P ;
KUCK, D ;
LO, S ;
PANG, Y ;
POINTER, L ;
ROLOFF, R ;
SAMEH, A ;
CLEMENTI, E ;
CHIN, S ;
SCHNEIDER, D ;
FOX, G ;
MESSINA, P ;
WALKER, D ;
HSIUNG, C ;
SCHWARZMEIER, J ;
LUE, K ;
ORSZAG, S ;
SEIDL, F ;
JOHNSON, O ;
GOODRUM, R ;
MARTIN, J .
INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1989, 3 (03) :5-40
[6]  
Cintra M, 2004, LECT NOTES COMPUT SC, V3045, P188
[7]  
Cintra M, 2000, PROCEEDING OF THE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, P13, DOI [10.1145/342001.363382, 10.1109/ISCA.2000.854373]
[8]  
CINTRA M, 2003, P 9 ACM SIGPLAN S PR, P13
[9]  
Dang FH, 2002, P INT PAR DISTR PROC, P20
[10]  
GUPTA M, 1998, SUPERCOMPUTING NOV