Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs

被引:13
作者
Barik, Rajkishore [1 ]
Sarkar, Vivek [1 ]
机构
[1] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA
来源
18TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS | 2009年
关键词
Load elimination; scalar replacement; parallel program; dynamic compilation; dynamic optimization; memory model; MEMORY; CONSISTENCY;
D O I
10.1109/PACT.2009.32
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Load elimination is a classical compiler transformation that is increasing in importance for multi-core and many-core architectures. The effect of the transformation is to replace a memory access, such as a read of an object field or an array element, by a read of a compiler-generated temporary that can be allocated in faster and more energy-efficient storage structures such as registers and local memories (scratchpads). Unfortunately, current just-in-time and dynamic compilers perform load elimination only in limited situations. In particular, they usually make worst-case assumptions about potential side effects arising from parallel constructs and method calls. These two constraints interact with each other since parallel constructs are usually translated to low-level runtime library calls. In this paper, we introduce an interprocedural load elimination algorithm suitable for use in dynamic optimization of parallel programs. The main contributions of the paper include: a) an algorithm for load elimination in the presence of three core parallel constructs - async, finish, and isolated, b) efficient side-effect analysis for method calls, c) extended side-effect analysis for parallel constructs using an Isolation Consistency memory model, and d) performance results to study the impact of load elimination on a set of standard benchmarks using an implementation of the algorithm in Jikes RVM for optimizing programs written in a subset of the X10 v1.5 language. Our performance results show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when comparing the algorithm in this paper with the load elimination algorithm available in Jikes RVM.
引用
收藏
页码:41 / 52
页数:12
相关论文
共 27 条
[1]   Recent advances in memory consistency models for hardware shared memory systems [J].
Adve, SV ;
Pai, VS ;
Ranganathan, P .
PROCEEDINGS OF THE IEEE, 1999, 87 (03) :445-455
[2]   May-Happen-in-Parallel Analysis of X10 Programs [J].
Agarwal, Shivali ;
Barik, Rajkishore ;
Sarkar, Vivek ;
Shyamasundar, Rudrapatna K. .
PROCEEDINGS OF THE 2007 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING PPOPP'07, 2007, :183-193
[3]  
ALLEN FE, 1974, P IFIP C 1974, P398
[4]  
BANNING J, 1979, 6TH ANN ACM S PRINC, P29
[5]  
BARIK R, 2006, P 2006 WORKSH PROGR
[6]   Load-reuse analysis:: Design and evaluation [J].
Bodík, R ;
Gupta, R ;
Soffa, ML .
ACM SIGPLAN NOTICES, 1999, 34 (05) :64-76
[7]  
CALLAHAN D, 1990, P ACM SIGPLAN 90 C P, P53
[8]   SCALAR REPLACEMENT IN THE PRESENCE OF CONDITIONAL CONTROL FLOW [J].
CARR, S ;
KENNEDY, K .
SOFTWARE-PRACTICE & EXPERIENCE, 1994, 24 (01) :51-77
[9]  
Charles P., 2005, OOPSLA 2005 ONWARD T
[10]  
CLAUSEN LR, 1997, ACM WORKSH JAV SCI E