FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA

被引:19
|
作者
Basalama, Suhail [1 ]
Sohrabizadeh, Atefeh [1 ]
Wang, Jie [1 ]
Guo, Licheng [1 ]
Cong, Jason [1 ]
机构
[1] Univ Calif Los Angeles, 404 Westwood Blvd Engn,6 Room 468, Los Angeles, CA 90095 USA
关键词
FPGA; CNN; ONNX; systolic array; transposed convolution; dilated convolution; OpenPose; U-Net; E-Net;
D O I
10.1145/3570928
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With reduced data reuse and parallelism, recent convolutional neural networks (CNNs) create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable architectures for convolutional layers, but without proper optimizations, their efficiency drops dramatically for reasons: (1) the different dimensions within same-type layers, (2) the different convolution layers especially transposed and dilated convolutions, and (3) CNN's complex dataflow graph. Furthermore, significant overheads arise when integrating FPGAs into machine learning frameworks. Therefore, we present a flexible, composable architecture called FlexCNN, which delivers high computation efficiency by employing dynamic tiling, layer fusion, and data layout optimizations. Additionally, we implement a novel versatile SA to process normal, transposed, and dilated convolutions efficiently. FlexCNN also uses a fully pipelined software-hardware integration that alleviates the software overheads. Moreover, with an automated compilation flow, FlexCNN takes a CNN in the ONNX1 representation, performs a design space exploration, and generates an FPGA accelerator. The framework is tested using three complex CNNs: OpenPose, U-Net, and E-Net. The architecture optimizations achieve 2.3x performance improvement. Compared to a standard SA, the versatile SA achieves close-to-ideal speedups, with up to 15.98x and 13.42x for transposed and dilated convolutions, with a 6% average area overhead. The pipelined integration leads to a 5x speedup for OpenPose.
引用
收藏
页数:32
相关论文
共 50 条
  • [1] DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators
    Xing, Yu
    Liang, Shuang
    Sui, Lingzhi
    Jia, Xijie
    Qiu, Jiantao
    Liu, Xin
    Wang, Yushun
    Shan, Yi
    Wang, Yu
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (10) : 2668 - 2681
  • [2] Xel-FPGAs: An End-to-End Automated Exploration Framework for Approximate Accelerators in FPGA-Based Systems
    Prabakaran, Bharath Srinivas
    Mrazek, Vojtech
    Vasicek, Zdenek
    Sekanina, Lukas
    Shafique, Muhammad
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
  • [3] Sparse R-CNN: An End-to-End Framework for Object Detection
    Sun, Peize
    Zhang, Rufeng
    Jiang, Yi
    Kong, Tao
    Xu, Chenfeng
    Zhan, Wei
    Tomizuka, Masayoshi
    Yuan, Zehuan
    Luo, Ping
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15650 - 15664
  • [4] Pflow: An end-to-end heterogeneous acceleration framework for CNN inference on FPGAs
    Wan, Yi
    Xie, Xianzhong
    Yi, Lingjie
    Jiang, Bo
    Chen, Junfan
    Jiang, Yi
    JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 150
  • [5] Toward a Behavioral-Level End-to-End Framework for Silicon Photonics Accelerators
    Lattanzio, Emily
    Zhou, Ranyang
    Roohi, Arman
    Khreishah, Abdallah
    Misra, Durga
    Angizi, Shaahin
    2022 IEEE 13TH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2022, : 141 - 146
  • [6] An end-to-end RNS CNN Accelerator
    Sakellariou, Vasilis
    Paliouras, Vassilis
    Kouretas, Ioannis
    Saleh, Hani
    Stouraitis, Thanos
    2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 75 - 79
  • [7] CNN-based End-to-end Autonomous Driving on FPGA Using TVM and VTA
    Uetsuki Toshihiro
    Okuyama Yuichi
    Shin Jungpil
    2021 IEEE 14TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2021), 2021, : 140 - 144
  • [8] An end-to-end test of neutron stars as particle accelerators
    Caraveo, PA
    PROCEEDINGS OF THE X-RAY UNIVERSE 2005, VOLS 1 AND 2, 2006, 604 : 131 - 138
  • [9] A focus module-based lightweight end-to-end CNN framework for voiceprint recognition
    Karthikeyan Velayuthapandian
    Suja Priyadharsini Subramoniam
    Signal, Image and Video Processing, 2023, 17 : 2817 - 2825
  • [10] A CNN-Based End-to-End Learning Framework Toward Intelligent Communication Systems
    Wu, Nan
    Wang, Xudong
    Lin, Bin
    Zhang, Kaiyao
    IEEE ACCESS, 2019, 7 : 110197 - 110204