Performance portability in a real world application: PHAST applied to Caffe

被引:2
作者
Antonio Martinez, Pablo [1 ]
Peccerillo, Biagio [2 ]
Bartolini, Sandro [2 ]
Garcia, Jose M. [1 ]
Bernabe, Gregorio [1 ]
机构
[1] Univ Murcia, Comp Engn Dept, Murcia, Spain
[2] Univ Siena, Dept Informat Engn & Math, Siena, Italy
关键词
High-performance computing; performance portability; heterogeneous computing; machine learning;
D O I
10.1177/10943420221077107
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This work covers the PHAST Library's employment, a hardware-agnostic programming library, to a real-world application like the Caffe framework. The original implementation of Caffe consists of two different versions of the source code: one to run on CPU platforms and another one to run on the GPU side. With PHAST, we aim to develop a single-source code implementation capable of running efficiently on CPU and GPU. In this paper, we start by carrying out a basic Caffe implementation performance analysis using PHAST. Then, we detail possible performance upgrades. We find that the overall performance is dominated by few 'heavy' layers. In refining the inefficient parts of this version, we find two different approaches: improvements to the Caffe source code and improvements to the PHAST Library itself, which ultimately translates into improved performance in the PHAST version of Caffe. We demonstrate that our PHAST implementation achieves performance portability on CPUs and GPUs. With a single source, the PHAST version of Caffe provides the same or even better performance than the original version of Caffe built from two different codebases. For the MNIST database, the PHAST implementation takes an equivalent amount of time as native code in CPU and GPU. Furthermore, PHAST achieves a speedup of 51% and a 49% with the CIFAR-10 database against native code in CPU and GPU, respectively. These results provide a new horizon for software development in the upcoming heterogeneous computing era.
引用
收藏
页码:419 / 439
页数:21
相关论文
共 38 条
[11]   Kokkos: Enabling performance portability across manycore architectures [J].
Edwards, H. Carter ;
Trott, Christian R. .
2013 EXTREME SCALING WORKSHOP (XSW 2013), 2014, :18-24
[12]  
Georganas E., 2018, P SC18 INT C HIGH PE, P830
[13]   Towards Cross-Platform Performance Portability of DNN Models using SYCL [J].
Goli, Mehdi ;
Narasimhan, Kumudha ;
Reyes, Ruyman ;
Tracy, Ben ;
Soutar, Daniel ;
Georgiev, Svetlozar ;
Fomenko, Evarist M. ;
Chereshnev, Eugene .
PROCEEDINGS OF 2020 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2020), 2020, :25-35
[14]  
Gomez-Hernandez EJ., 2020, 13 INT WORKSH PROGR, P11
[15]  
Guo K., 2021, NEURAL NETWORK ACCEL
[16]   A New Golden Age for Computer Architecture [J].
Hennessy, John L. ;
Patterson, David A. .
COMMUNICATIONS OF THE ACM, 2019, 62 (02) :48-60
[17]  
Hill, 2020, ACCELERATOR LEVEL PA
[18]  
Intel, 2020, ONEAPI SPEC
[19]   Caffe: Convolutional Architecture for Fast Feature Embedding [J].
Jia, Yangqing ;
Shelhamer, Evan ;
Donahue, Jeff ;
Karayev, Sergey ;
Long, Jonathan ;
Girshick, Ross ;
Guadarrama, Sergio ;
Darrell, Trevor .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :675-678
[20]   In-Datacenter Performance Analysis of a Tensor Processing Unit [J].
Jouppi, Norman P. ;
Young, Cliff ;
Patil, Nishant ;
Patterson, David ;
Agrawal, Gaurav ;
Bajwa, Raminder ;
Bates, Sarah ;
Bhatia, Suresh ;
Boden, Nan ;
Borchers, Al ;
Boyle, Rick ;
Cantin, Pierre-luc ;
Chao, Clifford ;
Clark, Chris ;
Coriell, Jeremy ;
Daley, Mike ;
Dau, Matt ;
Dean, Jeffrey ;
Gelb, Ben ;
Ghaemmaghami, Tara Vazir ;
Gottipati, Rajendra ;
Gulland, William ;
Hagmann, Robert ;
Ho, C. Richard ;
Hogberg, Doug ;
Hu, John ;
Hundt, Robert ;
Hurt, Dan ;
Ibarz, Julian ;
Jaffey, Aaron ;
Jaworski, Alek ;
Kaplan, Alexander ;
Khaitan, Harshit ;
Killebrew, Daniel ;
Koch, Andy ;
Kumar, Naveen ;
Lacy, Steve ;
Laudon, James ;
Law, James ;
Le, Diemthu ;
Leary, Chris ;
Liu, Zhuyuan ;
Lucke, Kyle ;
Lundin, Alan ;
MacKean, Gordon ;
Maggiore, Adriana ;
Mahony, Maire ;
Miller, Kieran ;
Nagarajan, Rahul ;
Narayanaswami, Ravi .
44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :1-12