Performance Portability Study of Epistasis Detection using SYCL on NVIDIA GPU

被引:4
|
作者
Jin, Zheming [1 ]
Vetter, Jeffrey S. [1 ]
机构
[1] Oak Ridge Natl Lab, POB 2008, Oak Ridge, TN 37830 USA
来源
13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022 | 2022年
关键词
portability; programming model; GPU; epistasis;
D O I
10.1145/3535508.3545591
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe the experience of converting a CUDA implementation of a high-order epistasis detection algorithm to SYCL. The goals are for our work to be useful to application and compiler developers with a detailed description of migration paths between CUDA and SYCL. Evaluating the CUDA and SYCL applications on an NVIDIA V100 GPU, we find that the optimization of loop unrolling needs to be applied manually to the SYCL kernel for obtaining comparable performance. The performance of the SYCL group reduce function, an alternative to the CUDA warp-based reduction, depends on the problem and work group sizes. The 64-bit popcount operation implemented with tree of adders is slightly faster than the built-in popcount operation. When the number of OpenMP threads is four, the highest performance of the SYCL and CUDA applications are comparable.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Towards performance portability of AI graphs using SYCL
    Narasimhan, Kumudha
    El Farouki, Ouadie
    Goli, Mehdi
    Tanvir, Muhammad
    Georgiev, Svetlozar
    Ault, Isaac
    2022 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC), 2022, : 111 - 122
  • [2] Comparing Performance and Portability between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs
    Costanzo, Manuel
    Rucci, Enzo
    Garcia-Sanchez, Carlos
    Naiouf, Marcelo
    Prieto-Matias, Manuel
    2023 IEEE 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, SBAC-PAD, 2023, : 141 - 148
  • [3] Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures
    Youssef Faqir-Rhazoui
    Carlos García
    The Journal of Supercomputing, 2023, 79 : 18480 - 18506
  • [4] Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures
    Faqir-Rhazoui, Youssef
    Garcia, Carlos
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (16): : 18480 - 18506
  • [5] Performance Study of GPU applications using SYCL and CUDA on Tesla V100 GPU
    Kuncham, Goutham Kalikrishna Reddy
    Vaidya, Rahul
    Barve, Mahesh
    2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2021,
  • [6] Evaluating the Performance and Portability of Contemporary SYCL Implementations
    Johnston, Beau
    Vetter, Jeffrey S.
    Milthorpe, Josh
    PROCEEDINGS OF 2020 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2020), 2020, : 45 - 56
  • [7] Dynamic Allocations in a Hierarchical Parallel Context A Study on Performance, Memory Footprint, and Portability Using SYCL
    Millan, Aymeric
    Padioleau, Thomas
    Bigot, Julien
    EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT II, EURO-PAR 2023, 2024, 14352 : 205 - 209
  • [8] Towards Cross-Platform Performance Portability of DNN Models using SYCL
    Goli, Mehdi
    Narasimhan, Kumudha
    Reyes, Ruyman
    Tracy, Ben
    Soutar, Daniel
    Georgiev, Svetlozar
    Fomenko, Evarist M.
    Chereshnev, Eugene
    PROCEEDINGS OF 2020 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2020), 2020, : 25 - 35
  • [9] A Benchmark Suite for Improving Performance Portability of the SYCL Programming Model
    Jin, Zheming
    Vetter, Jeffrey S.
    2023 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, ISPASS, 2023, : 325 - 327
  • [10] NVIDIA GPU PERFORMANCE MONITORING USING AN EXTENSION FOR DYNATRACE ONEAGENT
    Gajger, Tomasz
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2020, 21 (04): : 689 - 699