Optimizing Remote Communication in X10

被引:0
作者
Thangamani, Arun [1 ]
Nandivada, V. Krishna [1 ]
机构
[1] IIT Madras, Dept CSE, Chennai, Tamil Nadu, India
关键词
Remote communication; data serialization; program transformation; PGAS languages; OPTIMIZATION;
D O I
10.1145/3345558
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
X10 is a partitioned global address space programming language that supports the notion of places; a place consists of some data and some lightweight tasks called activities. Each activity runs at a place and may invoke a place-change operation (using the at-construct) to synchronously perform some computation at another place. These place-change operations can be very expensive, as they need to copy all the required data from the current place to the remote place. However, identifying the necessary number of place-change operations and the required data during each place-change operation are non-trivial tasks, especially in the context of irregular applications (like graph applications) that contain complex code with large amounts of cross-referencing objects-not all of those objects may be actually required, at the remote place. In this article, we present AT-Com, a scheme to optimize X10 code with place-change operations. AT-Com consists of two inter-related new optimizations: (i) AT-Opt, which minimizes the amount of data serialized and communicated during place-change operations, and (ii) AT-Pruning, which identifies/elides redundant place-change operations and does parallel execution of place-change operations. AT-Opt uses a novel abstraction, called abstract place tree, to capture place-change operations in the program. For each place-change operation, AT-Opt uses a novel inter-procedural analysis to precisely identify the data required at the remote place in terms of the variables in the current scope. AT-Opt then emits the appropriate code to copy the identified data-items to the remote place. AT-Pruning introduces a set of program transformation techniques to emit optimized code such that it avoids the redundant place-change operations. We have implemented AT-Com in the x10v2.6.0 compiler and tested it over the IMSuite benchmark kernels. Compared to the current X10 compiler, the AT-Com optimized code achieved a geometric mean speedup of 18.72x and 17.83x on a four-node (32 cores per node) Intel and two-node (16 cores per node) AMD system, respectively.
引用
收藏
页数:26
相关论文
共 28 条
[1]   May-Happen-in-Parallel Analysis of X10 Programs [J].
Agarwal, Shivali ;
Barik, Rajkishore ;
Sarkar, Vivek ;
Shyamasundar, Rudrapatna K. .
PROCEEDINGS OF THE 2007 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING PPOPP'07, 2007, :183-193
[2]  
Alvarez Marisa Alejandra, 2013, Cuad. Fac. Humanid. Cienc. Soc., Univ. Nac. Jujuy, P129
[3]  
[Anonymous], X10 LANGUAGE SPECIFI
[4]  
[Anonymous], 2014, PATH INF SUMM, DOI DOI 10.1145/2676870.2676877
[5]  
[Anonymous], 1994, Program analysis and specialization for the C programming language
[6]  
[Anonymous], HIPC
[7]  
[Anonymous], P 27 INT C PAR ARCH
[8]  
Barik R., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P1101, DOI 10.1109/IPDPS.2011.105
[9]   Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs [J].
Barik, Rajkishore ;
Sarkar, Vivek .
18TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2009, :41-52
[10]  
Bauer M, 2012, INT CONF HIGH PERFOR