Using Ensemble Learning to Improve Automatic Vectorization of Tensor Contraction Program

被引:3
作者
Liu, Hui [1 ,2 ]
Zhao, Rongcai [1 ]
Nie, Kai [1 ,3 ]
机构
[1] PLA Informat Engn Univ, State Key Lab Math Engn & Adv Comp, Zhengzhou 450001, Henan, Peoples R China
[2] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang 453007, Peoples R China
[3] Zhengzhou Univ, Sch Informat Engn, Zhengzhou 450001, Henan, Peoples R China
关键词
Automatic vectorization; compiler optimization; ensemble learning; program features; COMPILER HEURISTICS; MACHINE; OPTIMIZATION;
D O I
10.1109/ACCESS.2018.2867151
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic vectorization is crucial for improving the performance of computationally intensive programs. Existing compilers use conservative optimization strategies for automatic vectorization, which, in many cases, lead to the loss of vectorization opportunity. Studies have shown that the use of machine learning algorithms to build a performance prediction model is beneficial to improve the program performance. The model input is program features, and the output is the predicted optimization strategies or the program performance related to the optimization. In this paper, we focus on a computational intensive loop structure-tensor contraction, which is common in quantum chemical simulations. Most existing machine learning methods rely on control and data flow graphs as features to represent programs, but different tensor contraction kernels have the same control and data flow graphs. In addition, the existing methods often use the same kind of learning algorithm to construct a learning model, which is prone to overfitting and low-precision problems. In this paper, we propose an automatic vectorization performance enhancement method based on ensemble learning. We construct an ensemble learning model to predict the performance of tensor contraction kernels with different vectorization strategies and select the best vectorization strategy for the kernels. According to the storage access patterns of the tensor contraction kernels, we propose a static method for features representation. Based on the multi-algorithm ensemble learning strategy, we obtain better learning results than the single learning algorithm. The experimental results show that the prediction model achieves 88% and 87% prediction efficiency on two different platforms with different instruction sets, data types, and compilers. Compared with the existing methods, the prediction efficiency is greatly improved. In addition, the average peak performance is 2.96x of Intel ICC 12.0 and 2.98x of GCC 4.6 compiler, respectively.
引用
收藏
页码:47112 / 47124
页数:13
相关论文
共 26 条
[1]  
Agakov F., 2006, P INT S COD GEN OPT, P11
[2]  
Berk RA, 2008, SPRINGER SER STAT, P1, DOI 10.1007/978-0-387-77501-2_1
[3]  
Bouckaert RR, 2010, J MACH LEARN RES, V11, P2533
[4]  
Cavazos J, 2007, INT SYM CODE GENER, P185
[5]   Method-specific dynamic compilation using logistic regression [J].
Cavazos, John ;
O'Boyle, Michael F. P. .
ACM SIGPLAN NOTICES, 2006, 41 (10) :229-240
[6]  
Chandra A., 2006, J MATH MODELLING ALG, V5, P417, DOI DOI 10.1007/S10852-005-9020-3
[7]   An introduction to coupled cluster theory for computational chemists [J].
Crawford, TD ;
Schaefer, HF .
REVIEWS IN COMPUTATIONAL CHEMISTRY, VOL 14, 2000, 14 :33-136
[8]  
Ding YF, 2015, ACM SIGPLAN NOTICES, V50, P379, DOI [10.1145/2737924.2737969, 10.1145/2813885.2737969]
[9]   Milepost GCC: Machine Learning Enabled Self-tuning Compiler [J].
Fursin, Grigori ;
Kashnikov, Yuriy ;
Memon, Abdul Wahid ;
Chamski, Zbigniew ;
Temam, Olivier ;
Namolaru, Mircea ;
Yom-Tov, Elad ;
Mendelson, Bilha ;
Zaks, Ayal ;
Courtois, Eric ;
Bodin, Francois ;
Barnard, Phil ;
Ashton, Elton ;
Bonilla, Edwin ;
Thomson, John ;
Williams, Christopher K. I. ;
O'Boyle, Michael .
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2011, 39 (03) :296-327
[10]   Multiresolution computational chemistry [J].
Harrison, RJ ;
Fann, GI ;
Gan, ZT ;
Yanai, T ;
Sugiki, S ;
Beste, A ;
Beylkin, G .
SciDAC 2005: Scientific Discovery Through Advanced Computing, 2005, 16 :243-246