Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs

被引:1
|
作者
Tapiador-Morales, Ricardo [1 ]
Rios-Navarro, Antonio [1 ]
Linares-Barranco, Alejandro [1 ]
Kim, Minkyu [2 ]
Kadetotad, Deepak [2 ]
Seo, Jae-sun [2 ]
机构
[1] Univ Seville, Robot & Technol Comp Lab, Seville, Spain
[2] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ USA
关键词
Deep learning; Convolutional Neural Network; Hardware acceleration; OpenCL; FPGA; Caffe; Xilinx; Altera;
D O I
10.1007/978-3-319-59147-6_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.
引用
收藏
页码:271 / 282
页数:12
相关论文
共 50 条
  • [1] Melia: A MapReduce Framework on OpenCL-Based FPGAs
    Wang, Zeke
    Zhang, Shuhao
    He, Bingsheng
    Zhang, Wei
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (12) : 3547 - 3560
  • [2] A Study of Data Partitioning on OpenCL-based FPGAs
    Wang, Zeke
    He, Bingsheng
    Zhang, Wei
    2015 25TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2015,
  • [3] Relational Query Processing on OpenCL-based FPGAs
    Wang, Zeke
    Paul, Johns
    Ntu, Hui Yan Cheah
    He, Bingsheng
    Zhang, Wei
    2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
  • [4] Multikernel Data Partitioning With Channel on OpenCL-Based FPGAs
    Wang, Zeke
    Paul, Johns
    He, Bingsheng
    Zhang, Wei
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2017, 25 (06) : 1906 - 1918
  • [5] Improving Data Partitioning Performance on OpenCL-based FPGAs
    Wang, Zeke
    He, Bingsheng
    Zhang, Wei
    2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, : 34 - 34
  • [6] Query Processing on OpenCL-based FPGAs: Challenges and Opportunities
    Paul, Johns
    He, Bingsheng
    Lau, Chiew Tong
    2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 937 - 945
  • [7] An OpenCL-Based FPGA Accelerator for Faster R-CNN
    An, Jianjing
    Zhang, Dezheng
    Xu, Ke
    Wang, Dong
    ENTROPY, 2022, 24 (10)
  • [8] Optimizing OpenCL-Based CNN Design on FPGA with Comprehensive Design Space Exploration and Collaborative Performance Modeling
    Mu, Jiandong
    Zhang, Wei
    Liang, Hao
    Sinha, Sharad
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2020, 13 (03)
  • [9] FSCHOL: An OpenCL-based HPC Framework for Accelerating Sparse Cholesky Factorization on FPGAs
    Tavakoli, Erfan Bank
    Riera, Michael
    Quraishi, Masudul Hassan
    Ren, Fengbo
    2021 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2021), 2021, : 209 - 220
  • [10] On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-based FPGAs
    Chen, Xinyu
    Bajaj, Ronak
    Chen, Yao
    He, Jiong
    He, Bingsheng
    Wong, Weng-Fai
    Chen, Deming
    2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 67 - 73