HeteroMap: A Runtime Performance Predictor for Efficient Processing of Graph Analytics on Heterogeneous Multi-Accelerators

被引:11
作者
Ahmad, Masab [1 ]
Dogan, Halit [1 ]
Michael, Christopher J. [2 ]
Khan, Omer [1 ]
机构
[1] Univ Connecticut, Storrs, CT 06269 USA
[2] NRL, John C Stennis Space Ctr, MS USA
来源
2019 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS) | 2019年
基金
美国国家科学基金会;
关键词
BENCHMARK;
D O I
10.1109/ISPASS.2019.00039
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the ever-increasing amount of data and input variations, portable performance is becoming harder to exploit on today's architectures. Computational setups utilize single-chip processors, such as GPUs or large-scale multicores for graph analytics. Some algorithm-input combinations perform more efficiently when utilizing a GPU's higher concurrency and bandwidth, while others perform better with a multicore's stronger data caching capabilities. Architectural choices also occur within selected accelerators, where variables such as threading and thread placement need to be decided for optimal performance. This paper proposes a performance predictor paradigm for a heterogeneous parallel architecture where multiple disparate accelerators are integrated in an operational high performance computing setup. The predictor aims to improve graph processing efficiency by exploiting the underlying concurrency variations within and across the heterogeneous integrated accelerators using graph benchmark and input characteristics. The evaluation shows that intelligent and real-time selection of near-optimal concurrency choices provides performance benefits ranging from 5% to 3.8x, and an energy benefit averaging around 2.4x over the traditional single-accelerator setup.
引用
收藏
页码:268 / 281
页数:14
相关论文
共 44 条
[1]  
Adcock Aaron B., 2013, OpenMP in the Era of Low Power Devices and Accelerators. 9th International Workshop on OpenMP, IWOMP 2013. Proceedings: LNCS 8122, P71, DOI 10.1007/978-3-642-40698-0_6
[2]  
Ahmad M., 2015, IISWC
[3]   EFFICIENT SITUATIONAL SCHEDULING OF GRAPH WORKLOADS ON SINGLE-CHIP MULTICORES AND GPUS [J].
Ahmad, Masab ;
Michael, Christopher J. ;
Khan, Omer .
IEEE MICRO, 2017, 37 (01) :30-40
[4]  
[Anonymous], 2015, UBUNTU MANUALS POWER
[5]  
[Anonymous], 2006, GTgraph: A Synthetic Graph Generator Suite
[6]   OpenTuner: An Extensible Framework for Program Autotuning [J].
Ansel, Jason ;
Kamil, Shoaib ;
Veeramachaneni, Kalyan ;
Ragan-Kelley, Jonathan ;
Bosboom, Jeffrey ;
O'Reilly, Una-May ;
Amarasinghe, Saman .
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, :303-315
[7]   PetaBricks: A Language and Compiler for Algorithmic Choice [J].
Ansel, Jason ;
Chan, Cy ;
Wong, Yee Lok ;
Olszewski, Marek ;
Zhao, Qin ;
Edelman, Alan ;
Amarasinghe, Saman .
PLDI'09 PROCEEDINGS OF THE 2009 ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION, 2009, :38-49
[8]   Cross-Architecture Performance Prediction (XAPP) Using CPU Code to Predict GPU Performance [J].
Ardalani, Newsha ;
Lestourgeon, Clint ;
Sankaralingam, Karthikeyan ;
Zhu, Xiaojin .
PROCEEDINGS OF THE 48TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-48), 2015, :725-737
[9]  
Beamer S., 2015, IISWC
[10]  
BROOKS G, 1992, SIGPLAN NOTICES, V27, P1, DOI [10.13334/j.0258-8013.pcsee.213043, 10.1145/143103.143108]