Performance Portability Study of Epistasis Detection using SYCL on NVIDIA GPU

被引：4

作者：

Jin, Zheming ^{[1
]}

Vetter, Jeffrey S. ^{[1
]}

机构：

[1] Oak Ridge Natl Lab, POB 2008, Oak Ridge, TN 37830 USA

来源：

13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022 | 2022年

关键词：

portability; programming model; GPU; epistasis;

D O I：

10.1145/3535508.3545591

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We describe the experience of converting a CUDA implementation of a high-order epistasis detection algorithm to SYCL. The goals are for our work to be useful to application and compiler developers with a detailed description of migration paths between CUDA and SYCL. Evaluating the CUDA and SYCL applications on an NVIDIA V100 GPU, we find that the optimization of loop unrolling needs to be applied manually to the SYCL kernel for obtaining comparable performance. The performance of the SYCL group reduce function, an alternative to the CUDA warp-based reduction, depends on the problem and work group sizes. The 64-bit popcount operation implemented with tree of adders is slightly faster than the built-in popcount operation. When the number of OpenMP threads is four, the highest performance of the SYCL and CUDA applications are comparable.

引用

页数：8

共 50 条

[21] Performance Portability Evaluation of OpenCL Benchmarks across Intel and NVIDIA Platforms
Bertoni, Colleen
Kwack, JaeHyuk
Applencourt, Thomas
Ghadar, Yasarnan
Honierding, Brian
Knight, Christopher
Videau, Brice
Zheng, Huihuo
Morozov, Vitali
Parker, Scott
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 330 - 339
[22] CHARM-SYCL & IRIS: A Tool Chain for Performance Portability on Extremely Heterogeneous Systems
Fujita, Norihisa
Johnston, Beau
Miniskar, Narasinga Rao
Kobayashi, Ryohei
Monil, Mohammad Alaul Haque
Teranishi, Keita
Lee, Seyong
Vetter, Jeffrey S.
Boku, Taisuke
2024 IEEE 20TH INTERNATIONAL CONFERENCE ON E-SCIENCE, E-SCIENCE 2024, 2024,
[23] Performance Portability of a GPU Enabled Factorization with the DAGuE Framework
Bosilca, George
Bouteiller, Aurelien
Herault, Thomas
Lemarinier, Pierre
Saengpatsa, Narapat Ohm
Tomov, Stanimire
Dongarra, Jack J.
2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 395 - 402
[24] Enhancing the Programmability and Performance Portability of GPU Tensor Operations
Mazaheri, Arya
Schulte, Johannes
Moskewicz, Matthew W.
Wolf, Felix
Jannesari, Ali
EURO-PAR 2019: PARALLEL PROCESSING, 2019, 11725 : 213 - 226
[25] Taking GPU Programming Models to Task for Performance Portability
Davis, Joshua H.
Sivaraman, Pranav
Kitson, Joy
Parasyris, Konstantinos
Menon, Harshitha
Minn, Isaac
Georgakoudis, Giorgis
Bhatele, Abhinav
arXiv,
[26] NVIDIA A100 Tensor Core GPU: Performance and Innovation
Choquette, Jack
Gandhi, Wishwesh
Giroux, Olivier
Stam, Nick
Krashinsky, Ronny
IEEE MICRO, 2021, 41 (02) : 29 - 35
[27] NVIDIA Hopper H100 GPU: Scaling Performance
Choquette, Jack
IEEE MICRO, 2023, 43 (03) : 9 - 17
[28] Performance Evaluation of the NVIDIA Pascal GPU Architecture: Early Experiences
Reano, Carlos
Silla, Federico
PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 1234 - 1235
[29] Performance Analysis of NVIDIA GPU Virtualization in NARI Desktop Cloud
Wang Zhao
Miao Jingwen
Yu Jun
Zhu Guangxin
2019 3RD INTERNATIONAL CONFERENCE ON DATA SCIENCE AND BUSINESS ANALYTICS (ICDSBA 2019), 2019, : 405 - 408
[30] OpenCL Performance Portability for Xeon Phi Coprocessor and NVIDIA GPUs: A Case Study of Finite Element Numerical Integration
Banas, Krzysztof
Kruzel, Filip
EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT II, 2014, 8806 : 158 - 169

← 1 2 3 4 5 →