A collaborative CPU-GPU approach for deep learning on mobile devices

被引：8

作者：

Valery, Olivier ^{[1
,2
]}

Liu, Pangfeng ^{[1
,3
]}

Wu, Jan-Jan ^{[1
]}

机构：

[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei, Taiwan

[2] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan

[3] Natl Taiwan Univ, Grad Inst Networking & Multimedia, Taipei, Taiwan

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2019年 / 31卷 / 17期

关键词：

deep learning; energy efficient; GPGPU; heterogeneous system; mobile computing; OpenCL; OPENCL;

D O I：

10.1002/cpe.5225

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

As mobile devices become more prevalent, users tend to reassess their expectations regarding the personalization of mobile services. The data collected by a mobile device's sensors provide an opportunity to gain insight into the user's profile. Recently, deep learning has gained momentum and has become the method of choice for solving machine learning problems. Interestingly, training a deep neural network on a mobile device is often mistakenly regarded as cumbersome. For instance, several deep learning frameworks only provide a CPU-based implementation for prediction tasks on a mobile device. In contrast to servers, a mobile computing environment imposes many domain-specific constraints that invite us to review the general computing approach used in a deep learning framework implementation. In this paper, we propose a deep learning framework that has been specifically designed for mobile device platforms. Our approach relies on the collaboration of the multicore CPU and the integrated GPU to accelerate deep learning computation on mobile devices. Our work exploits the shared memory architecture of mobile devices to promote CPU-GPU collaboration without any data copying. We analyze our approach with regard to three factors: performance/portability trade-off, power efficiency, and memory management.

引用

页数：21

共 50 条

[21] Accelerating a computer vision algorithm on a mobile SoC using CPU-GPU co-processing - A case study on face detection
Lee, Youngwan
Jang, Cheolyong
Kim, Hakil
2016 IEEE/ACM INTERNATIONAL CONFERENCE ON MOBILE SOFTWARE ENGINEERING AND SYSTEMS (MOBILESOFT 2016), 2016, : 70 - 76
[22] Performance Analysis and CPU vs GPU Comparison for Deep Learning
Buber, Ebubekir
Diri, Banu
2018 6TH INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING & INFORMATION TECHNOLOGY (CEIT), 2018,
[23] An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory
Long, Xinjian
Gong, Xiangyang
Zhang, Bo
Zhou, Huiyang
JOURNAL OF GRID COMPUTING, 2023, 21 (01)
[24] A Simple Cache Coherence Scheme for Integrated CPU-GPU Systems
Yudha, Ardhi Wiratama Baskara
Pulungan, Reza
Hoffmann, Henry
Solihin, Yan
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
[25] Load Balancing for CPU-GPU Coupling in Computational Fluid Dynamics
Huismann, Immo
Lieber, Matthias
Stiller, Joerg
Froehlich, Jochen
PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2017), PT I, 2018, 10777 : 337 - 347
[26] An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory
Xinjian Long
Xiangyang Gong
Bo Zhang
Huiyang Zhou
Journal of Grid Computing, 2023, 21
[27] MPtostream:an OpenMP compiler for CPU-GPU heterogeneous parallel systems
YANG XueJun
ScienceChina(InformationSciences), 2012, 55 (09) : 1961 - 1971
[28] Analyzing Memory Management Methods on Integrated CPU-GPU Systems
Dashti, Mohammad
Fedorova, Alexandra
ACM SIGPLAN NOTICES, 2017, 52 (09) : 59 - 69
[29] MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems
Yang XueJun
Tang Tao
Wang GuiBin
Jia Jia
Xu XinHai
SCIENCE CHINA-INFORMATION SCIENCES, 2012, 55 (09) : 1961 - 1971
[30] MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems
XueJun Yang
Tao Tang
GuiBin Wang
Jia Jia
XinHai Xu
Science China Information Sciences, 2012, 55 : 1961 - 1971

← 1 2 3 4 5 →