A Machine Learning Approach for Productive Data Locality Exploitation in Parallel Computing Systems

被引:0
作者
Kayraklioglu, Engin [1 ]
Favry, Erwan [1 ]
El-Ghazawi, Tarek [1 ]
机构
[1] George Washington Univ, Elect & Comp Engn Dept, Washington, DC 20052 USA
来源
2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) | 2019年
关键词
data locality; distributed memory; machine learning; optimization;
D O I
10.1109/CCGRID.201.9.000.50
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data locality is of extreme importance in programming distributed-memory architectures due to its implications on latency and energy consumption. Automated compiler and runtime system optimization studies have attempted to improve data locality exploitation without burdening the programmer. However, due to the difficulty of static code analysis, conservatism in compiler optimizations to avoid errors, and cost of dynamic analysis, the efficacy of automated optimizations is limited. Therefore, programmers need to spend significant effort in optimizing locality. In this work, we present an automated code optimization framework that trains neural networks using application profiles for small data sizes that exhibit similar patterns to larger cases. The application is then modified to use the neural network to improve data locality exploitation. We prototype our framework for the Chapel language and integrate with the language stack. We experimentally demonstrate that our framework can learn access patterns and create optimized executables in minutes. The resulting executables perform more than one order of magnitude faster than unoptimized code, and are comparable to manual locality optimization without burdening the programmer and hindering productivity.
引用
收藏
页码:361 / 370
页数:10
相关论文
共 35 条
[1]  
Alvarez Marisa Alejandra, 2013, Cuad. Fac. Humanid. Cienc. Soc., Univ. Nac. Jujuy, P129
[2]   Exploiting Hierarchical Locality in Deep Parallel Architectures [J].
Anbar, Ahmad ;
Serres, Olivier ;
Kayraklioglu, Engin ;
Badawy, Abdel-Hameed A. ;
El-Ghazawi, Tarek .
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (02)
[3]  
Angerson E., 1990, Proceedings of Supercomputing '90 (Cat. No.90CH2916-5), P2, DOI 10.1109/SUPERC.1990.129995
[4]  
Barik R., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P1101, DOI 10.1109/IPDPS.2011.105
[5]  
Beazley D., 2010, PYCON PYTH C ATL GEO
[6]  
Bonachea Dan, 2002, UCBCSD021207
[7]  
Bondhugula U, 2008, LECT NOTES COMPUT SC, V4959, P132
[8]  
Cantonnet F., 2004, Proceedings. 18th International Parallel and Distributed Processing Symposium
[9]   Parallel programmability and the Chapel language [J].
Chamberlain, B. L. ;
Callahan, D. ;
Zima, H. P. .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2007, 21 (03) :291-312
[10]  
Chamberlain B.L., 2010, Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism, HotPar'10, P12