Performance Portability Study of Epistasis Detection using SYCL on NVIDIA GPU

被引:4
|
作者
Jin, Zheming [1 ]
Vetter, Jeffrey S. [1 ]
机构
[1] Oak Ridge Natl Lab, POB 2008, Oak Ridge, TN 37830 USA
来源
13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022 | 2022年
关键词
portability; programming model; GPU; epistasis;
D O I
10.1145/3535508.3545591
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe the experience of converting a CUDA implementation of a high-order epistasis detection algorithm to SYCL. The goals are for our work to be useful to application and compiler developers with a detailed description of migration paths between CUDA and SYCL. Evaluating the CUDA and SYCL applications on an NVIDIA V100 GPU, we find that the optimization of loop unrolling needs to be applied manually to the SYCL kernel for obtaining comparable performance. The performance of the SYCL group reduce function, an alternative to the CUDA warp-based reduction, depends on the problem and work group sizes. The 64-bit popcount operation implemented with tree of adders is slightly faster than the built-in popcount operation. When the number of OpenMP threads is four, the highest performance of the SYCL and CUDA applications are comparable.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Performance Portability Evaluation of OpenCL Benchmarks across Intel and NVIDIA Platforms
    Bertoni, Colleen
    Kwack, JaeHyuk
    Applencourt, Thomas
    Ghadar, Yasarnan
    Honierding, Brian
    Knight, Christopher
    Videau, Brice
    Zheng, Huihuo
    Morozov, Vitali
    Parker, Scott
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 330 - 339
  • [22] CHARM-SYCL & IRIS: A Tool Chain for Performance Portability on Extremely Heterogeneous Systems
    Fujita, Norihisa
    Johnston, Beau
    Miniskar, Narasinga Rao
    Kobayashi, Ryohei
    Monil, Mohammad Alaul Haque
    Teranishi, Keita
    Lee, Seyong
    Vetter, Jeffrey S.
    Boku, Taisuke
    2024 IEEE 20TH INTERNATIONAL CONFERENCE ON E-SCIENCE, E-SCIENCE 2024, 2024,
  • [23] Performance Portability of a GPU Enabled Factorization with the DAGuE Framework
    Bosilca, George
    Bouteiller, Aurelien
    Herault, Thomas
    Lemarinier, Pierre
    Saengpatsa, Narapat Ohm
    Tomov, Stanimire
    Dongarra, Jack J.
    2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 395 - 402
  • [24] Enhancing the Programmability and Performance Portability of GPU Tensor Operations
    Mazaheri, Arya
    Schulte, Johannes
    Moskewicz, Matthew W.
    Wolf, Felix
    Jannesari, Ali
    EURO-PAR 2019: PARALLEL PROCESSING, 2019, 11725 : 213 - 226
  • [25] Taking GPU Programming Models to Task for Performance Portability
    Davis, Joshua H.
    Sivaraman, Pranav
    Kitson, Joy
    Parasyris, Konstantinos
    Menon, Harshitha
    Minn, Isaac
    Georgakoudis, Giorgis
    Bhatele, Abhinav
    arXiv,
  • [26] NVIDIA A100 Tensor Core GPU: Performance and Innovation
    Choquette, Jack
    Gandhi, Wishwesh
    Giroux, Olivier
    Stam, Nick
    Krashinsky, Ronny
    IEEE MICRO, 2021, 41 (02) : 29 - 35
  • [27] NVIDIA Hopper H100 GPU: Scaling Performance
    Choquette, Jack
    IEEE MICRO, 2023, 43 (03) : 9 - 17
  • [28] Performance Evaluation of the NVIDIA Pascal GPU Architecture: Early Experiences
    Reano, Carlos
    Silla, Federico
    PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 1234 - 1235
  • [29] Performance Analysis of NVIDIA GPU Virtualization in NARI Desktop Cloud
    Wang Zhao
    Miao Jingwen
    Yu Jun
    Zhu Guangxin
    2019 3RD INTERNATIONAL CONFERENCE ON DATA SCIENCE AND BUSINESS ANALYTICS (ICDSBA 2019), 2019, : 405 - 408
  • [30] OpenCL Performance Portability for Xeon Phi Coprocessor and NVIDIA GPUs: A Case Study of Finite Element Numerical Integration
    Banas, Krzysztof
    Kruzel, Filip
    EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT II, 2014, 8806 : 158 - 169