Autotuning in High-Performance Computing Applications

被引:83
作者
Balaprakash, Prasanna [1 ]
Dongarra, Jack [2 ,3 ,4 ]
Gamblin, Todd [5 ]
Hall, Mary [6 ]
Hollingsworth, Jeffrey K. [7 ]
Norris, Boyana [8 ]
Vuduc, Richard [9 ]
机构
[1] Argonne Natl Lab, 9700 S Cass Ave, Argonne, IL 60439 USA
[2] Univ Tennessee, Knoxville, TN 37996 USA
[3] Oak Ridge Natl Lab, Oak Ridge, TN 37831 USA
[4] Univ Manchester, Manchester M13 9PL, Lancs, England
[5] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
[6] Univ Utah, Salt Lake City, UT 84112 USA
[7] Univ Maryland, College Pk, MD 20742 USA
[8] Univ Oregon, Eugene, OR 97403 USA
[9] Georgia Inst Technol, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
High-performance computing; performance tuning programming systems; SEARCH; IMPLEMENTATION; DESIGN;
D O I
10.1109/JPROC.2018.2841200
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Autotuning refers to the automatic generation of a search space of possible implementations of a computation that are evaluated through models and/or empirical measurement to identify the most desirable implementation. Autotuning has the potential to dramatically improve the performance portability of petascale and exascale applications. To date, autotuning has been used primarily in high-performance applications through tunable libraries or previously tuned application code that is integrated directly into the application. This paper draws on the authors' extensive experience applying autotuning to high-performance applications, describing both successes and future challenges. If autotuning is to be widely used in the HPC community, researchers must address the software engineering challenges, manage configuration overheads, and continue to demonstrate significant performance gains and portability across architectures. In particular, tools that configure the application must be integrated into the application build process so that tuning can be reapplied as the application and target architectures evolve.
引用
收藏
页码:2068 / 2083
页数:16
相关论文
共 75 条
[1]  
Ahmad Khalid, 2016, P 29 INT WORKSH LANG, P218
[2]   Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations [J].
Aktulga, Hasan Metin ;
Buluc, Aydin ;
Williams, Samuel ;
Yang, Chao .
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
[3]  
[Anonymous], JH WILKINSON PRIZE N
[4]  
[Anonymous], 2014, PROC ANN M TRANSPORT
[5]  
[Anonymous], 2005, SC 05 P 2005 ACMIEEE, DOI DOI 10.1109/SC.2005.52
[6]  
[Anonymous], P 2 WORKSH SOFTW ENG
[7]  
[Anonymous], 1998, SC 98, DOI [10.5555/509058.509096, DOI 10.1109/SC.1998.10004]
[8]  
[Anonymous], P SIAM C PAR PROC SC
[9]  
[Anonymous], 2015, SC 15 P INT C HIGH P
[10]  
[Anonymous], P WORKSH PROGR ABSTR