Guide Automatic Vectorization by means of Machine Learning: A Case Study of Tensor Contraction Kernels

被引：3

作者：

Trouve, Antoine ^{[1
]}

Cruz, Arnaldo J. ^{[1
]}

Murakami, Kazuaki J. ^{[1
]}

Arai, Masaki ^{[2
]}

Nakahira, Tadashi ^{[2
]}

Yamanaka, Eiji ^{[3
]}

机构：

[1] Kyushu Univ, Dept Engn, Fukuoka 8190395, Japan

[2] Fujitsu Labs Ltd, Kawasaki, Kanagawa 2118588, Japan

[3] Fujitsu Ltd, Tokyo 1057123, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2016年 / E99D卷 / 06期

关键词：

automatic vectorization; machine learning; software optimization; OPTIMIZATION;

D O I：

10.1587/transinf.2015EDP7440

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern optimizing compilers tend to be conservative and often fail to vectorize programs that would have benefited from it. In this paper, we propose a way to predict the relevant command-line options of the compiler so that it chooses the most profitable vectorization strategy. Machine learning has proven to be a relevant approach for this matter: fed with features that describe the software to the compiler, a machine learning device is trained to predict an appropriate optimization strategy. The related work relies on the control and data flow graphs as software features. In this article, we consider tensor contraction programs, useful in various scientific simulations, especially chemistry. Depending on how they access the memory, different tensor contraction kernels may yield very different performance figures. However, they exhibit identical control and data flow graphs, making them completely out of reach of the related work. In this paper, we propose an original set of software features that capture the important properties of the tensor contraction kernels. Considering the Intel Merom processor architecture with the Intel Compiler, we model the problem as a classification problem and we solve it using a support vector machine. Our technique predicts the best suited vectorization options of the compiler with a cross-validation accuracy of 93.4%, leading to up to a 3-times speedup compared to the default behavior of the Intel Compiler. This article ends with an original qualitative discussion on the performance of software metrics by means of visualization. All our measurements are made available for the sake of reproducibility.

引用

页码：1585 / 1594

页数：10

共 14 条

[1] Agakov F, 2006, INT SYM CODE GENER, P295
[2] Statistical modeling: The two cultures
Breiman, L
[J]. STATISTICAL SCIENCE, 2001, 16 (03) : 199 - 215
[3] Collective Optimization: A Practical Collaborative Approach
Fursin, Grigori
Temam, Olivier
[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2010, 7 (04)
[4] Milepost GCC: Machine Learning Enabled Self-tuning Compiler
Fursin, Grigori
Kashnikov, Yuriy
Memon, Abdul Wahid
Chamski, Zbigniew
Temam, Olivier
Namolaru, Mircea
Yom-Tov, Elad
Mendelson, Bilha
Zaks, Ayal
Courtois, Eric
Bodin, Francois
Barnard, Phil
Ashton, Elton
Bonilla, Edwin
Thomson, John
Williams, Christopher K. I.
O'Boyle, Michael
[J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2011, 39 (03) : 296 - 327
[5] Microarchitecture-inidependent workload characterization
Hoste, Kenneth
Eeckhout, Lieven
[J]. IEEE MICRO, 2007, 27 (03) : 63 - 72
[6] Hoste K, 2008, INT SYM CODE GENER, P165
[7] Mitigating the Compiler Optimization Phase-Ordering Problem using Machine Learning
Kulkarni, Sameer
Cavazos, John
[J]. ACM SIGPLAN NOTICES, 2012, 47 (10) : 147 - 162
[8] Automatic Feature Generation for Machine Learning Based Optimizing Compilation
Leather, Hugh
Bonilla, Edwin
O'Boyle, Michael
[J]. CGO 2009: INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, PROCEEDINGS, 2009, : 81 - 91
[9] Park E., 2012, P 10 INT S COD GEN O, P196, DOI DOI 10.1145/2259016.2259042
[10] Predictive Modeling in a Polyhedral Optimization Space
Park, Eunjung
Cavazos, John
Pouchet, Louis-Noel
Bastoul, Cedric
Cohen, Albert
Sadayappan, P.
[J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2013, 41 (05) : 704 - 750

← 1 2 →