Scheduling optimization of parallel linear algebra algorithms using Supervised Learning

被引:2
作者
Laberge, Gabriel [1 ]
Shirzad, Shahrzad [2 ]
Diehl, Patrick [2 ]
Kaiser, Hartmut [2 ]
Prudhomme, Serge [1 ]
Lemoine, Adrian S. [2 ]
机构
[1] Polytech Montreal, Dept Math & Ind Engn, Montreal, PQ, Canada
[2] Louisiana State Univ, Ctr Computat & Technol, Baton Rouge, LA 70803 USA
来源
PROCEEDINGS OF 2019 5TH IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2019) | 2019年
关键词
HPC; linear algebra; loop-level parallelism; supervised learning; HPX; Blaze; AMT; PERFORMANCE PREDICTION; SCHEME;
D O I
10.1109/MLHPC49564.2019.00009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linear algebra algorithms are used widely in a variety of domains, e.g machine learning, numerical physics and video games graphics. For all these applications, loop-level parallelism is required to achieve high performance. However, finding the optimal way to schedule the workload between threads is a non-trivial problem because it depends on the structure of the algorithm being parallelized and the hardware the executable is run on. In the realm of Asynchronous Many Task runtime systems, a key aspect of the scheduling problem is predicting the proper chunk-size, where the chunk-size is defined as the number of iterations of a for-loop assigned to a thread as one task. In this paper, we study the applications of supervised learning models to predict the chunk-size which yields maximum performance on multiple parallel linear algebra operations using the HPX backend of Blaze's linear algebra library. More precisely, we generate our training and tests sets by measuring performance of the application with different chunk-sizes for multiple linear algebra operations; vector-addition, matrix-vector-multiplication, matrix-matrix addition and matrix-matrix-multiplication. We compare the use of logistic regression, neural networks and decision trees with a newly developed decision tree based model in order to predict the optimal value for chunk-size. Our results show that classical decision trees and our custom decision tree model are able to forecast a chunk-size which results in good performance for the linear algebra operations.
引用
收藏
页码:31 / 43
页数:13
相关论文
共 24 条
[1]  
[Anonymous], 2001, P 2001 ACM IEEE C SU
[2]  
Blagojevic F, 2008, LECT NOTES COMPUT SC, V4917, P38, DOI 10.1007/978-3-540-77560-7_4
[3]  
Falgout RD, 2002, LECT NOTES COMPUT SC, V2331, P632
[4]  
Gebauer Axel, 2014, BirdingASIA, V21, P6
[5]  
Hastie T., 2005, MATH INTELL, V27, P241
[6]  
Heller T., 2017, HPXAN OPEN SOURCE C
[7]   FACTORING - A METHOD FOR SCHEDULING PARALLEL LOOPS [J].
HUMMEL, SF ;
SCHONBERG, E ;
FLYNN, LE .
COMMUNICATIONS OF THE ACM, 1992, 35 (08) :90-101
[8]   EXPRESSION TEMPLATES REVISITED: A PERFORMANCE ANALYSIS OF CURRENT METHODOLOGIES [J].
Iglberger, Klaus ;
Hager, Georg ;
Treibig, Jan ;
Ruede, Ulrich .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (02) :C42-C69
[9]  
Ipek E, 2005, LECT NOTES COMPUT SC, V3648, P196
[10]  
Kaiser H., 2019, STELLAR GROUP HPX HP