Optimizing Remote Communication in X10

被引：0

作者：

Thangamani, Arun ^{[1
]}

Nandivada, V. Krishna ^{[1
]}

机构：

[1] IIT Madras, Dept CSE, Chennai, Tamil Nadu, India

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2019年 / 16卷 / 04期

关键词：

Remote communication; data serialization; program transformation; PGAS languages; OPTIMIZATION;

D O I：

10.1145/3345558

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

X10 is a partitioned global address space programming language that supports the notion of places; a place consists of some data and some lightweight tasks called activities. Each activity runs at a place and may invoke a place-change operation (using the at-construct) to synchronously perform some computation at another place. These place-change operations can be very expensive, as they need to copy all the required data from the current place to the remote place. However, identifying the necessary number of place-change operations and the required data during each place-change operation are non-trivial tasks, especially in the context of irregular applications (like graph applications) that contain complex code with large amounts of cross-referencing objects-not all of those objects may be actually required, at the remote place. In this article, we present AT-Com, a scheme to optimize X10 code with place-change operations. AT-Com consists of two inter-related new optimizations: (i) AT-Opt, which minimizes the amount of data serialized and communicated during place-change operations, and (ii) AT-Pruning, which identifies/elides redundant place-change operations and does parallel execution of place-change operations. AT-Opt uses a novel abstraction, called abstract place tree, to capture place-change operations in the program. For each place-change operation, AT-Opt uses a novel inter-procedural analysis to precisely identify the data required at the remote place in terms of the variables in the current scope. AT-Opt then emits the appropriate code to copy the identified data-items to the remote place. AT-Pruning introduces a set of program transformation techniques to emit optimized code such that it avoids the redundant place-change operations. We have implemented AT-Com in the x10v2.6.0 compiler and tested it over the IMSuite benchmark kernels. Compared to the current X10 compiler, the AT-Com optimized code achieved a geometric mean speedup of 18.72x and 17.83x on a four-node (32 cores per node) Intel and two-node (16 cores per node) AMD system, respectively.

引用

页数：26

共 28 条

[1] May-Happen-in-Parallel Analysis of X10 Programs [J].

Agarwal, Shivali ;

Barik, Rajkishore ;

Sarkar, Vivek ;

Shyamasundar, Rudrapatna K. .

PROCEEDINGS OF THE 2007 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING PPOPP'07, 2007, :183-193

[2]

Alvarez Marisa Alejandra, 2013, Cuad. Fac. Humanid. Cienc. Soc., Univ. Nac. Jujuy, P129

[3]

[Anonymous], X10 LANGUAGE SPECIFI

[4]

[Anonymous], 2014, PATH INF SUMM, DOI DOI 10.1145/2676870.2676877

[5]

[Anonymous], 1994, Program analysis and specialization for the C programming language

[6]

[Anonymous], HIPC

[7]

[Anonymous], P 27 INT C PAR ARCH

[8]

Barik R., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P1101, DOI 10.1109/IPDPS.2011.105

[9] Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs [J].

Barik, Rajkishore ;

Sarkar, Vivek .

18TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2009, :41-52

[10]

Bauer M, 2012, INT CONF HIGH PERFOR

← 1 2 3 →