An OpenCL Framework for Heterogeneous Multicores with Local Memory

被引：0

作者：

Lee, Jaejin ^{[1
]}

Kim, Jungwon ^{[1
]}

Seo, Sangmin ^{[1
]}

Kim, Seungkyun ^{[1
]}

Park, Jungho ^{[1
]}

Kim, Honggyu ^{[1
]}

Thanh Tuan Dao ^{[1
]}

Cho, Yongjin ^{[1
]}

Seo, Sung Jong

Lee, Seung Hak

Cho, Seung Mo

Song, Hyo Jung

Suh, Sang-Bum

Choi, Jong-Deok

机构：

[1] Seoul Natl Univ, Sch Comp Sci & Engn, Seoul 151744, South Korea

来源：

PACT 2010: PROCEEDINGS OF THE NINETEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES | 2010年

关键词：

OpenCL; Compilers; Runtime; Software-managed caches; Memory consistency; Work-item coalescing; Preload-poststore buffering;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we present the design and implementation of an Open Computing Language (OpenCL) framework that targets heterogeneous accelerator multicore architectures with local memory. The architecture consists of a general-purpose processor core and multiple accelerator cores that typically do not have any cache. Each accelerator core, instead, has a small internal local memory. Our OpenCL runtime is based on software-managed caches and coherence protocols that guarantee OpenCL memory consistency to overcome the limited size of the local memory. To boost performance, the runtime relies on three source-code transformation techniques, work-item coalescing, web-based variable expansion and preload-poststore buffering, performed by our OpenCL C source-to-source translator. Work-item coalescing is a procedure to serialize multiple SPMD-like tasks that execute concurrently in the presence of barriers and to sequentially run them on a single accelerator core. It requires the web-based variable expansion technique to allocate local memory for private variables. Preload-poststore buffering is a buffering technique that eliminates the overhead of software cache accesses. Together with work-item coalescing, it has a synergistic effect on boosting performance. We show the effectiveness of our OpenCL framework, evaluating its performance with a system that consists of two Cell BE processors. The experimental result shows that our approach is promising.

引用

页码：193 / 204

页数：12

共 50 条

[1] A Runtime Resource Management Policy for OpenCL Workloads on Heterogeneous Multicores
Angioletti, Daniele
Bertani, Francesco
Bolchini, Cristiana
Cerizzi, Francesco
Miele, Antonio
2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 1385 - 1390
[2] Accelerating Local Feature Extraction using OpenCL on Heterogeneous Platforms
Moren, Konrad
Perschke, Thomas
Goehringer, Diana
PROCEEDINGS OF THE 2014 CONFERENCE ON DESIGN AND ARCHITECTURES FOR SIGNAL AND IMAGE PROCESSING, 2014,
[3] Multi-Task Scheduling Framework for OpenCL Programs on CPUsGPUs Heterogeneous Platforms
Wang, Hao
Wang, Haofeng
Wang, Sufang
THIRD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION; NETWORK AND COMPUTER TECHNOLOGY (ECNCT 2021), 2022, 12167
[4] Aristotle: A performance impact indicator for the OpenCL kernels using local memory
Fang, Jianbin
Sips, Henk
Varbanescu, Ana Lucia
SCIENTIFIC PROGRAMMING, 2014, 22 (03) : 239 - 257
[5] Optimizing Convolutional Neural Network on FPGA under Heterogeneous Computing Framework with OpenCL
Wang, Zhengrong
Qiao, Fei
Liu, Zhen
Shan, Yuxiang
Zhou, Xunyi
Luo, Li
Yang, Huazhong
PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 3433 - 3438
[6] Flexible Parallel Implementation of LLR BP Decoding Simulation on Multicores Using OpenCL
Volkov, Igor
Kharin, Aleksei
Dryakhlov, Aleksei
Mirokhin, Evgeny
Terekhov, Konstantin
Zavertkin, Konstantin
Ovinnikov, Aleksei
Likhobabin, Evgeny
Vityazev, Vladimir
2017 25TH TELECOMMUNICATION FORUM (TELFOR), 2017, : 258 - 261
[7] Heterogeneous System Implementation of Deep Learning Neural Network for Object Detection in OpenCL Framework
Li, Shuai
Luo, Yukui
Sun, Kuangyuan
Choi, Ken
2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2018, : 456 - 459
[8] ELMO: A User-Friendly API to Enable Local Memory in OpenCL Kernels
Fang, Jianbin
Varbanescu, Ana Lucia
Shen, Jie
Sips, Henk
PROCEEDINGS OF THE 2013 21ST EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2013, : 375 - 383
[9] GPU-FPGA Heterogeneous Computing with OpenCL-enabled Direct Memory Access
Kobayashi, Ryohei
Fujita, Norihisa
Yamaguchi, Yoshiki
Nakamichi, Ayumi
Boku, Taisuke
2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 489 - 498
[10] Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels
Fang, Jianbin
Sips, Henk
Jaaskelainen, Pekka
Varbanescu, Ana Lucia
2014 43RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2014, : 162 - 171

← 1 2 3 4 5 →