Performance Improvement of CUDA Applications by Reducing CPU-GPU Data Transfer Overhead

被引:0
|
作者
Sunitha, N., V [1 ]
Raju, K. [1 ]
Chiplunkar, Niranjan N. [1 ]
机构
[1] NMAMIT, Dept CSE, Nitte, India
来源
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT) | 2017年
关键词
Heterogeneous system; CUDA; Kernel; Stream;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a CPU-GPU based heterogeneous computing system, the input data to be processed by the kernel resides in the host memory. The host and the device memory address spaces are different. Therefore, the device can not directly access the host memory. In CUDA programming model, the data is moved between the host memory and the device memory. This data transfer is a time consuming task. The communication overhead can be hidden by overlapping the data transfer and the kernel execution. CUDA streams provide a means for overlapping data transfer and the kernel execution. In this paper we explore the effects of overlapping data transfer and the kernel execution on overall execution time of some CUDA applications. The results show that the usage of the different levels of concurrency supported by the streams enhances the performance of the CUDA applications.
引用
收藏
页码:211 / 215
页数:5
相关论文
共 20 条
  • [1] Boosting CUDA Applications with CPU-GPU Hybrid Computing
    Lee, Changmin
    Ro, Won Woo
    Gaudiot, Jean-Luc
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2014, 42 (02) : 384 - 404
  • [2] Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications
    Tallada, Marc Gonzalez
    Morancho, Enric
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2023, 37 (05) : 626 - 646
  • [3] Boosting CUDA Applications with CPU–GPU Hybrid Computing
    Changmin Lee
    Won Woo Ro
    Jean-Luc Gaudiot
    International Journal of Parallel Programming, 2014, 42 : 384 - 404
  • [4] Performance Optimization for CPU-GPU Heterogeneous Parallel System
    Wang, Yanhua
    Qiao, Jianzhong
    Lin, Shukuan
    Zhao, Tinglei
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1259 - 1266
  • [5] Comparison of analytical and ML-based models for predicting CPU-GPU data transfer time
    Riahi, Ali
    Savadi, Abdorreza
    Naghibzadeh, Mahmoud
    COMPUTING, 2020, 102 (09) : 2099 - 2116
  • [6] Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing
    Suda, Reiji
    Ren, Da Qi
    2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009), 2009, : 432 - 438
  • [7] Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU-GPU clusters
    Feichtinger, Christian
    Habich, Johannes
    Koestler, Harald
    Ruede, Ulrich
    Aoki, Takayuki
    PARALLEL COMPUTING, 2015, 46 : 1 - 13
  • [8] High Performance FFT Based Poisson Solver on a CPU-GPU Heterogeneous Platform
    Wu, Jing
    JaJa, Joseph
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 115 - 125
  • [9] Feedback Control Optimization for Performance and Energy Efficiency on CPU-GPU Heterogeneous Systems
    Lin, Feng-Sheng
    Liu, Po-Ting
    Li, Ming-Hua
    Hsiung, Pao-Ann
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2016, 2016, 10048 : 388 - 404
  • [10] CPU-GPU Tuning for Modern Scientific Applications using Node-Level Heterogeneity
    Thavappiragasam, Mathialakan
    Kale, Vivek
    2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 179 - 183