Deploying and Optimizing Convolutional Neural Networks on Heterogeneous Architecture

被引:0
作者
Jiang, Junning [1 ]
Cai, Liang [1 ]
Dong, Feng [2 ]
Yu, Kehua [2 ]
Chen, Ke [2 ]
Qu, Wei [2 ]
Jiang, Jianfei [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Microelect, Shanghai 200240, Peoples R China
[2] Beijing iQIYI Sci & Technol Co Ltd, Shanghai, Peoples R China
来源
2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ASIC (ASICON) | 2019年
关键词
CNN; acceleration; FPGA; optimization; data flow; compute precision;
D O I
10.1109/asicon47005.2019.8983456
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deploying convolutional neural networks to hardware platform can accelerate the inference and is critical for the application of artificial intelligence. In this paper, we design an FPGA+CPU heterogeneous platform to accelerate CNNs. Dataflow optimizing, accelerator structure optimization and compute precision optimization are proposed to improve performance of the accelerating platform. Different ResNet and MobileNet networks are successfully deployed on the platform. By applying the proposed dataflow optimization and precision optimization, the performance improvement of inference is 3.25x on ResNet. By applying the accelerator structure optimization and precision optimization, the performance improvement of inference is 3.63x on MobileNet.
引用
收藏
页数:4
相关论文
共 9 条
  • [1] DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration
    Abdelfattah, Mohamed S.
    Han, David
    Bitar, Andrew
    DiCecco, Roberto
    O'Connell, Shane
    Shanker, Nitika
    Chu, Joseph
    Prins, Ian
    Fender, Joshua
    Ling, Andrew C.
    Chiu, Gordon R.
    [J]. 2018 28TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2018, : 411 - 418
  • [2] An OpenCLTM Deep Learning Accelerator on Arria 10
    Aydonat, Utku
    O'Connell, Shane
    Capalija, Davor
    Ling, Andrew C.
    Chiu, Gordon R.
    [J]. FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 55 - 64
  • [3] A CNN Accelerator on FPGA Using Depthwise Separable Convolution
    Bai, Lin
    Zhao, Yiming
    Huang, Xinming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (10) : 1415 - 1419
  • [4] Flexibility: FPGAs and CAD in Deep Learning Acceleration
    Chiu, Gordon R.
    Ling, Andrew C.
    Capalija, Davor
    Bitar, Andrew
    Abdelfattah, Mohamed S.
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN (ISPD'18), 2018, : 34 - 41
  • [5] Guo B, 2018, INFINITE-DIMENSIONAL DYNAMICAL SYSTEMS, VOL 2: ATTRACTORS AND METHODS, P1, DOI 10.1515/9783110587265
  • [6] Ma Yufei., 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL), P1, DOI DOI 10.1109/ISCAS.2017.8050344
  • [7] Ma Yufei, 2017, INT S FIELD PROGR GA, P45
  • [8] Maximizing CNN Accelerator Efficiency Through Resource Partitioning
    Shen, Yongming
    Ferdman, Michael
    Milder, Peter
    [J]. 44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, : 535 - 547
  • [9] Zhao R., 2018, INT S FIELD PROGR GA, P285