Utilizing cloud FPGAs towards the open neural network standard

被引:16
作者
Danopoulos, Dimitrios [1 ]
Kachris, Christoforos [2 ,3 ]
Soudris, Dimitrios [1 ]
机构
[1] NTUA, Dept Elect & Comp Engn, Athens, Greece
[2] Democritus Univ Thrace, Athens, Greece
[3] ICCS NTUA, Athens, Greece
关键词
Machine learning; Neural networks; ONNX; FPGAs; High level synthesis; Cloud; Heterogeneous computing;
D O I
10.1016/j.suscom.2021.100520
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Accurate and efficient Machine Learning algorithms are of vital importance to many problems, especially on classification or clustering tasks but need a universal AI model standard. Unifying machine learning models into a common ecosystem can lead to less development time and better framework interoperability. ONNX (Open Neural Network Exchange Format) is a popular open format to represent deep learning models so that AI developers can more easily move models between state-of-the-art tools. On top of that, hardware companies such as Nvidia or Intel try to keep up with this trend and produce hardware-optimized runtimes (i.e. for CPUs, GPUs, FPGAs) that can handle these open format AI models like ONNX. That enables developers to leverage an heterogeneous mix of hardware and use whichever AI framework they prefer. However, FPGAs have a more challenging solution strategy which as a platform it is also proven to address these kind of problems very efficiently in terms of performance and power. This work is based on an early development stage project which is called HLS4ML originally created for particle physics applications via the automatic generation of neural networks (NNs) for embedded Xilinx FPGAs. Our work involves a hardware-aware NN training and a generalized optimization scheme on top of HLS4ML that boosts the performance and power efficiency of this package and adds functionality for cloud FPGA firmware from any NN model. We start from the FPGA-oriented training of a model in Keras for image recognition, converting into the ONNX open format then porting and optimizing it for cloud FPGAs using a novel scheme with optimizations in host, memory and kernels while using multiple levels of network precision. To the best of our knowledge this is a novel approach that also achieves a speed-up of up to 102 & times; over single CPU in performance and up to 5.5 & times; over GPU in performance/watt.
引用
收藏
页数:7
相关论文
共 25 条
[1]  
Abdelouahab K., 2018, CORR ARXIV180601683
[2]  
Alemdar H, 2017, IEEE IJCNN, P2547, DOI 10.1109/IJCNN.2017.7966166
[3]  
Betkaoui B., 2010, Proceedings 2010 International Conference on Field-Programmable Technology (FPT 2010), P94, DOI 10.1109/FPT.2010.5681761
[4]  
Chollet F., 2015, Keras
[5]   Understanding Performance Differences of FPGAs and GPUs [J].
Cong, Jason ;
Fang, Zhenman ;
Lo, Michael ;
Wang, Hanrui ;
Xu, Jingxian ;
Zhang, Shaochong .
PROCEEDINGS 26TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2018), 2018, :93-96
[6]  
Danopoulos D, 2018, 2018 7TH INTERNATIONAL CONFERENCE ON MODERN CIRCUITS AND SYSTEMS TECHNOLOGIES (MOCAST)
[7]   Automatic Generation of FPGA Kernels From Open Format CNN Models [J].
Danopoulos, Dimitrios ;
Kachris, Christoforos ;
Soudris, Dimitrios .
28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2020, :237-237
[8]   Fast inference of deep neural networks in FPGAs for particle physics [J].
Duarte, J. ;
Han, S. ;
Harris, P. ;
Jindariani, S. ;
Kreinar, E. ;
Kreis, B. ;
Ngadiuba, J. ;
Pierini, M. ;
Rivera, R. ;
Tran, N. ;
Wu, Z. .
JOURNAL OF INSTRUMENTATION, 2018, 13
[9]  
Fu Y., 2017, TECHNICAL REPORT WP4
[10]   ReBNet: Residual Binarized Neural Network [J].
Ghasemzadeh, Mohammad ;
Samragh, Mohammad ;
Koushanfar, Farinaz .
PROCEEDINGS 26TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2018), 2018, :57-64