Generation of parallel synchronization-free tiled code

被引：3

作者：

Bielecki, Wlodzimierz ^{[1
]}

Palkowski, Marek ^{[1
]}

Skotnicki, Piotr ^{[1
]}

机构：

[1] West Pomeranian Univ Technol, Fac Comp Sci, Ul Zolnierska 49, PL-71210 Szczecin, Poland

来源：

COMPUTING | 2018年 / 100卷 / 03期

关键词：

Synchronization-free parallelism; Tiling; Transitive closure; Optimizing compiler; Polyhedral model; Iteration space slicing; AFFINE SCHEDULING PROBLEM; EFFICIENT SOLUTIONS;

D O I：

10.1007/s00607-017-0576-3

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

A novel approach to generation of parallel synchronization-free tiled code for the loop nest is presented. It is derived via a combination of the Polyhedral and Iteration Space Slicing frameworks. It uses the transitive closure of loop nest dependence graphs to carry out corrections of original rectangular tiles so that all dependences of the original loop nest are preserved under the lexicographic order of target (corrected) tiles. Then parallel synchronization-free tiled code is generated on the basis of valid (corrected) tiles applying the transitive closure of dependence graphs. The main contribution of the paper is demonstrating that the presented technique is able to generate parallel synchronization-free tiled code, provided that the exact transitive closure of a dependence graph can be calculated and there exist synchronization-free slices on the statement instance level in the loop nest. We show that the presented approach extracts such a parallelism when well-known techniques fail to extract it. Enlarging the scope of loop nests, for which synchronization-free tiled code can be generated, is achieved by means of applying the intersection of extracted slices and generated valid tiles, in contrast to forming slices of valid tiles as suggested in previously published techniques based on the transitive closure of a dependence graph. The presented approach is implemented in the publicly available TC optimizing compiler. Results of experiments demonstrating the effectiveness of the approach and the efficiency of parallel programs generated by means of it are discussed.

引用

页码：277 / 302

页数：26

共 36 条

[1]

[Anonymous], 2015, NAS BENCHMARKS SUITE

[2]

Bandishti V., 2012, INT CONF HIGH PERFOR, P1, DOI DOI 10.1109/SC.2012.107

[3] Code generation in the polyhedral model is easier than you think [J].

Bastoul, C .

13TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES, PROCEEDINGS, 2004, :7-16

[4] Coarse-grained loop parallelization: Iteration Space Slicing vs affine transformations [J].

Beletska, Anna ;

Bielecki, Wlodzimierz ;

Cohen, Albert ;

Palkowski, Marek ;

Siedlecki, Krzysztof .

PARALLEL COMPUTING, 2011, 37 (08) :479-497

[5] TILING ARBITRARILY NESTED LOOPS BY MEANS OF THE TRANSITIVE CLOSURE OF DEPENDENCE GRAPHS [J].

Bielecki, Wlodzimierz ;

Palkowski, Marek .

INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2016, 26 (04) :919-939

[6] Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph [J].

Bielecki, Wlodzimierz ;

Palkowski, Marek .

SOFT COMPUTING IN COMPUTER AND INFORMATION SCIENCE, 2015, 342 :309-320

[7] Free Scheduling of Tiles Based on the Transitive Closure of Dependence Graphs [J].

Bielecki, Wlodzimierz ;

Palkowski, Marek ;

Klimek, Tomasz .

PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT II, 2016, 9574 :133-142

[8] Free scheduling for statement instances of parameterized arbitrarily nested affine loops [J].

Bielecki, Wlodzimierz ;

Palkowski, Marek ;

Klimek, Tomasz .

PARALLEL COMPUTING, 2012, 38 (09) :518-532

[9] Using basis dependence distance vectors in the modified Floyd-Warshall algorithm [J].

Bielecki, Wodzimierz ;

Kraska, Krzysztof ;

Klimek, Tomasz .

JOURNAL OF COMBINATORIAL OPTIMIZATION, 2015, 30 (02) :253-275

[10] A practical automatic polyhedral parallelizer and locality optimizer [J].

Bondhugula, Uday ;

Hartono, Albert ;

Ramanujam, J. ;

Sadayappan, P. .

ACM SIGPLAN NOTICES, 2008, 43 (06) :101-113

← 1 2 3 4 →