Performance Portability Study of Epistasis Detection using SYCL on NVIDIA GPU

被引：4

作者：

Jin, Zheming ^{[1
]}

Vetter, Jeffrey S. ^{[1
]}

机构：

[1] Oak Ridge Natl Lab, POB 2008, Oak Ridge, TN 37830 USA

来源：

13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022 | 2022年

关键词：

portability; programming model; GPU; epistasis;

D O I：

10.1145/3535508.3545591

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We describe the experience of converting a CUDA implementation of a high-order epistasis detection algorithm to SYCL. The goals are for our work to be useful to application and compiler developers with a detailed description of migration paths between CUDA and SYCL. Evaluating the CUDA and SYCL applications on an NVIDIA V100 GPU, we find that the optimization of loop unrolling needs to be applied manually to the SYCL kernel for obtaining comparable performance. The performance of the SYCL group reduce function, an alternative to the CUDA warp-based reduction, depends on the problem and work group sizes. The 64-bit popcount operation implemented with tree of adders is slightly faster than the built-in popcount operation. When the number of OpenMP threads is four, the highest performance of the SYCL and CUDA applications are comparable.

引用

页数：8

共 50 条

[1] Towards performance portability of AI graphs using SYCL
Narasimhan, Kumudha
El Farouki, Ouadie
Goli, Mehdi
Tanvir, Muhammad
Georgiev, Svetlozar
Ault, Isaac
2022 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC), 2022, : 111 - 122
[2] Comparing Performance and Portability between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs
Costanzo, Manuel
Rucci, Enzo
Garcia-Sanchez, Carlos
Naiouf, Marcelo
Prieto-Matias, Manuel
2023 IEEE 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, SBAC-PAD, 2023, : 141 - 148
[3] Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures
Youssef Faqir-Rhazoui
Carlos García
The Journal of Supercomputing, 2023, 79 : 18480 - 18506
[4] Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures
Faqir-Rhazoui, Youssef
Garcia, Carlos
JOURNAL OF SUPERCOMPUTING, 2023, 79 (16): : 18480 - 18506
[5] Performance Study of GPU applications using SYCL and CUDA on Tesla V100 GPU
Kuncham, Goutham Kalikrishna Reddy
Vaidya, Rahul
Barve, Mahesh
2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2021,
[6] Evaluating the Performance and Portability of Contemporary SYCL Implementations
Johnston, Beau
Vetter, Jeffrey S.
Milthorpe, Josh
PROCEEDINGS OF 2020 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2020), 2020, : 45 - 56
[7] Dynamic Allocations in a Hierarchical Parallel Context A Study on Performance, Memory Footprint, and Portability Using SYCL
Millan, Aymeric
Padioleau, Thomas
Bigot, Julien
EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT II, EURO-PAR 2023, 2024, 14352 : 205 - 209
[8] Towards Cross-Platform Performance Portability of DNN Models using SYCL
Goli, Mehdi
Narasimhan, Kumudha
Reyes, Ruyman
Tracy, Ben
Soutar, Daniel
Georgiev, Svetlozar
Fomenko, Evarist M.
Chereshnev, Eugene
PROCEEDINGS OF 2020 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2020), 2020, : 25 - 35
[9] A Benchmark Suite for Improving Performance Portability of the SYCL Programming Model
Jin, Zheming
Vetter, Jeffrey S.
2023 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, ISPASS, 2023, : 325 - 327
[10] NVIDIA GPU PERFORMANCE MONITORING USING AN EXTENSION FOR DYNATRACE ONEAGENT
Gajger, Tomasz
SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2020, 21 (04): : 689 - 699

← 1 2 3 4 5 →