Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs

被引：13

作者：

Barik, Rajkishore ^{[1
]}

Sarkar, Vivek ^{[1
]}

机构：

[1] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA

来源：

18TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS | 2009年

关键词：

Load elimination; scalar replacement; parallel program; dynamic compilation; dynamic optimization; memory model; MEMORY; CONSISTENCY;

D O I：

10.1109/PACT.2009.32

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Load elimination is a classical compiler transformation that is increasing in importance for multi-core and many-core architectures. The effect of the transformation is to replace a memory access, such as a read of an object field or an array element, by a read of a compiler-generated temporary that can be allocated in faster and more energy-efficient storage structures such as registers and local memories (scratchpads). Unfortunately, current just-in-time and dynamic compilers perform load elimination only in limited situations. In particular, they usually make worst-case assumptions about potential side effects arising from parallel constructs and method calls. These two constraints interact with each other since parallel constructs are usually translated to low-level runtime library calls. In this paper, we introduce an interprocedural load elimination algorithm suitable for use in dynamic optimization of parallel programs. The main contributions of the paper include: a) an algorithm for load elimination in the presence of three core parallel constructs - async, finish, and isolated, b) efficient side-effect analysis for method calls, c) extended side-effect analysis for parallel constructs using an Isolation Consistency memory model, and d) performance results to study the impact of load elimination on a set of standard benchmarks using an implementation of the algorithm in Jikes RVM for optimizing programs written in a subset of the X10 v1.5 language. Our performance results show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when comparing the algorithm in this paper with the load elimination algorithm available in Jikes RVM.

引用

页码：41 / 52

页数：12

共 27 条

[1] Recent advances in memory consistency models for hardware shared memory systems [J].

Adve, SV ;

Pai, VS ;

Ranganathan, P .

PROCEEDINGS OF THE IEEE, 1999, 87 (03) :445-455

[2] May-Happen-in-Parallel Analysis of X10 Programs [J].

Agarwal, Shivali ;

Barik, Rajkishore ;

Sarkar, Vivek ;

Shyamasundar, Rudrapatna K. .

PROCEEDINGS OF THE 2007 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING PPOPP'07, 2007, :183-193

[3]

ALLEN FE, 1974, P IFIP C 1974, P398

[4]

BANNING J, 1979, 6TH ANN ACM S PRINC, P29

[5]

BARIK R, 2006, P 2006 WORKSH PROGR

[6] Load-reuse analysis:: Design and evaluation [J].

Bodík, R ;

Gupta, R ;

Soffa, ML .

ACM SIGPLAN NOTICES, 1999, 34 (05) :64-76

[7]

CALLAHAN D, 1990, P ACM SIGPLAN 90 C P, P53

[8] SCALAR REPLACEMENT IN THE PRESENCE OF CONDITIONAL CONTROL FLOW [J].

CARR, S ;

KENNEDY, K .

SOFTWARE-PRACTICE & EXPERIENCE, 1994, 24 (01) :51-77

[9]

Charles P., 2005, OOPSLA 2005 ONWARD T

[10]

CLAUSEN LR, 1997, ACM WORKSH JAV SCI E

← 1 2 3 →