Predictive Modeling in a Polyhedral Optimization Space

被引：36

作者：

Park, Eunjung ^{[1
]}

Cavazos, John ^{[1
]}

Pouchet, Louis-Noel ^{[2
,3
]}

Bastoul, Cedric ^{[4
]}

Cohen, Albert ^{[5
]}

Sadayappan, P. ^{[2
]}

机构：

[1] Univ Delaware, Newark, DE USA

[2] Ohio State Univ, Columbus, OH 43210 USA

[3] Univ Calif Los Angeles, Los Angeles, CA USA

[4] Univ Paris 11, LRI, F-91405 Orsay, France

[5] INRIA Paris Rocquencourt ENS, Le Chesnay, France

来源：

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING | 2013年 / 41卷 / 05期

基金：

美国国家科学基金会;

关键词：

Loop transformation; Polyhedral optimization; Iterative compilation; Machine learning; Performance counters; AFFINE SCHEDULING PROBLEM; ITERATIVE OPTIMIZATION; LOOP TRANSFORMATIONS; EFFICIENT SOLUTIONS; COMPILATION; GENERATION;

D O I：

10.1007/s10766-013-0241-1

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

High-level program optimizations, such as loop transformations, are critical for high performance on multi-core targets. However, complex sequences of loop transformations are often required to expose parallelism (both coarse-grain and fine-grain) and improve data locality. The polyhedral compilation framework has proved to be very effective at representing these complex sequences and restructuring compute-intensive applications, seamlessly handling perfectly and imperfectly nested loops. It models arbitrarily complex sequences of loop transformations in a unified mathematical framework, dramatically increasing the expressiveness (and expected effectiveness) of the loop optimization stage. Nevertheless identifying the most effective loop transformations remains a major challenge: current state-of-the-art heuristics in polyhedral frameworks simply fail to expose good performance over a wide range of numerical applications. Their lack of effectiveness is mainly due to simplistic performance models that do not reflect the complexity today's processors (CPU, cache behavior, etc.). We address the problem of selecting the best polyhedral optimizations with dedicated machine learning models, trained specifically on the target machine. We show that these models can quickly select high-performance optimizations with very limited iterative search. We decouple the problem of selecting good complex sequences of optimizations in two stages: (1) we narrow the set of candidate optimizations using static cost models to select the loop transformations that implement specific high-level optimizations (e.g., tiling, parallelism, etc.); (2) we predict the performance of each high-level complex optimization sequence with trained models that take as input a performance-counter characterization of the original program. Our end-to-end framework is validated using numerous benchmarks on two modern multi-core platforms. We investigate a variety of different machine learning algorithms and hardware counters, and we obtain performance improvements over productions compilers ranging on average from to , by running not more than program variants from a polyhedral optimization space.

引用

页码：704 / 750

页数：47

共 61 条

[1] AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2] Finding effective compilation sequences
Almagor, L
Cooper, KD
Grosul, A
Harvey, TJ
Reeves, SW
Subramanian, D
Torczon, L
Waterman, T
[J]. ACM SIGPLAN NOTICES, 2004, 39 (07) : 231 - 239
[3] Angerson E., 1990, Proceedings of Supercomputing '90 (Cat. No.90CH2916-5), P2, DOI 10.1109/SUPERC.1990.129995
[4] [Anonymous], 2007, P INT S COD GEN OPT
[5] [Anonymous], 1998, SC 98, DOI [10.5555/509058.509096, DOI 10.1109/SC.1998.10004]
[6] [Anonymous], 2006, P INT S COD GEN OPT
[7] [Anonymous], P GCC DEV SUMM
[8] [Anonymous], 2008, 08897 U SO CAL
[9] Bastoul Cedric, 2004, P INT C PAR ARCH COM
[10] Baumgartner G., 2002, SUPERCOMPUTING

← 1 2 3 4 5 6 7 →