Efficient fine-grained shared buffer management for multiple OpenCL devices

被引:1
作者
Xun, Chang-qing [1 ,2 ]
Chen, Dong [1 ,2 ]
Lan, Qiang [1 ,2 ]
Zhang, Chun-yuan [1 ,2 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, State Key Lab High Performance Comp, Changsha 410073, Hunan, Peoples R China
来源
JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS | 2013年 / 14卷 / 11期
基金
中国国家自然科学基金; 高等学校博士学科点专项科研基金;
关键词
Shared buffer; OpenCL; Heterogeneous programming; Fine grained; CPU; GPU;
D O I
10.1631/jzus.C1300078
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
OpenCL programming provides full code portability between different hardware platforms, and can serve as a good programming candidate for heterogeneous systems, which typically consist of a host processor and several accelerators. However, to make full use of the computing capacity of such a system, programmers are requested to manage diverse OpenCL-enabled devices explicitly, including distributing the workload between different devices and managing data transfer between multiple devices. All these tedious jobs pose a huge challenge for programmers. In this paper, a distributed shared OpenCL memory (DSOM) is presented, which relieves users of having to manage data transfer explicitly, by supporting shared buffers across devices. DSOM allocates shared buffers in the system memory and treats the on-device memory as a software managed virtual cache buffer. To support fine-grained shared buffer management, we designed a kernel parser in DSOM for buffer access range analysis. A basic modified, shared, invalid cache coherency is implemented for DSOM to maintain coherency for cache buffers. In addition, we propose a novel strategy to minimize communication cost between devices by launching each necessary data transfer as early as possible. This strategy enables overlap of data transfer with kernel execution. Our experimental results show that the applicability of our method for buffer access range analysis is good, and the efficiency of DSOM is high.
引用
收藏
页码:859 / 872
页数:14
相关论文
共 27 条
  • [21] Towards Efficient Reverse-time Migration Imaging Computation by Pipeline and Fine-grained Execution Parallelization
    Gu, Rong
    Li, Bo
    Liu, Dingjin
    Wang, Zhaokang
    Wangzhang, Suhui
    Wang, Shulin
    Dai, Haipeng
    Huang, Yihua
    2022 IEEE 25TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING, CSE, 2022, : 90 - 97
  • [22] FISCP: fine-grained device-free positioning system for multiple targets working in sparse deployments
    Xie, Binbin
    Fang, Dingyi
    Xing, Tianzhang
    Zhang, Lichao
    Chen, Xiaojiang
    Tang, Zhanyong
    Wang, Anwen
    WIRELESS NETWORKS, 2016, 22 (05) : 1751 - 1766
  • [23] MicroMCM: Fine-grained Root Cause Localization for Microservice Systems Based on Multiple Causal Inference Methods
    Gao, Hanqing
    Zhao, Junfeng
    Li, Wenhao
    Li, Zhengxin
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 371 - 376
  • [24] FISCP: fine-grained device-free positioning system for multiple targets working in sparse deployments
    Binbin Xie
    Dingyi Fang
    Tianzhang Xing
    Lichao Zhang
    Xiaojiang Chen
    Zhanyong Tang
    Anwen Wang
    Wireless Networks, 2016, 22 : 1751 - 1766
  • [25] SRAM-DRAM Hybrid Memory with Applications to Efficient Register Files in Fine-Grained Multi-Threading
    Yu, Wing-kei S.
    Huang, Ruirui
    Xu, Sarah Q.
    Wang, Sung-En
    Kan, Edwin
    Suh, G. Edward
    ISCA 2011: PROCEEDINGS OF THE 38TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 2011, : 247 - 258
  • [26] An Ultra-Low Power Multi-Level Power-on Reset for Fine-Grained Power Management Strategies
    Rueda G, Luis E.
    Cuevas, Nestor
    Roa, Elkim
    2019 IEEE 10TH LATIN AMERICAN SYMPOSIUM ON CIRCUITS & SYSTEMS (LASCAS), 2019, : 185 - 188
  • [27] Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management
    Li, Bingchao
    Sun, Jizhou
    Annavaram, Murali
    Kim, Nam Sung
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 82 - 91