XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine

被引:2
|
作者
Jia, Xijie [1 ]
Zhang, Yu [1 ]
Liu, Guangdong [1 ]
Yang, Xinlin [1 ]
Zhang, Tianyu [1 ]
Zheng, Jia [1 ]
Xu, Dongdong [1 ]
Liu, Zhuohuan [1 ]
Liu, Mengke [1 ]
Yan, Xiaoyang [1 ]
Wang, Hong [1 ]
Zheng, Rongzhang [1 ]
Wang, Li [1 ]
Li, Dong [1 ]
Pareek, Satyaprakash [1 ]
Weng, Jian [1 ]
Tian, Lu [1 ]
Xie, Dongliang [1 ]
Luo, Hong [1 ]
Shan, Yi [2 ]
机构
[1] AMD, 15F Block B China Overseas Int Ctr,Bldg 5 5 Yard, Beijing 100029, Peoples R China
[2] PhiGent Robot, 25F,Tower B,Tsinghua Tongfang High Tech Plaza,1 W, Beijing 100083, Peoples R China
关键词
ACAP; acceleration; AI Engine; ALU engine; CNN; FPGA; hardware heterogeneous architecture; Versal; IMAGE SUPERRESOLUTION;
D O I
10.1145/3617836
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Today, convolutional neural networks (CNNs) are widely used in computer vision applications. However, the trends of higher accuracy and higher resolution generate larger networks. The requirements of computation or I/O are the key bottlenecks. In this article, we propose XVDPU: the AI Engine (AIE)-based CNN accelerator on Versal chips to meet heavy computation requirements. To resolve the IO bottleneck, we adopt several techniques to improve data reuse and reduce I/O requirements. An arithmetic logic unit is further proposed that can better balance resource utilization, new feature support, and efficiency of the whole system. We have successfully deployed more than 100 CNN models with our accelerator. Our experimental results show that the 96-AIE-core implementation can achieve 1,653 frames per second (FPS) for ResNet50 on VCK190, which is 9.8x faster than the design on ZCU102 running at 168.5 FPS. The 256-AIE-core implementation can further achieve 4,050 FPS. We propose a tilling strategy to achieve feature-map-stationary for high-definition CNN with the accelerator, achieving 3.8x FPS improvement on the residual channel attention network and 3.1x on super-efficient super-resolution. This accelerator can also solve the 3D convolution task in disparity estimation, achieving end-to-end performance of 10.1 FPS with all the optimizations.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] A High-Performance and Flexible Architecture for Accelerating SDN on the MPSoC Platform
    Sha, Meng
    Guo, Zhichuan
    Guo, Yunfei
    Zeng, Xuewen
    MICROMACHINES, 2022, 13 (11)
  • [22] High-performance Convolutional Neural Network Accelerator Based on Systolic Arrays and Quantization
    Li, Yufeng
    Lu, Shengli
    Luo, Jihe
    Pang, Wei
    Liu, Hao
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 335 - 339
  • [23] A High-Performance and Power-Efficient SIMD Convolution Engine for FPGAs
    Spagnolo, Fanny
    Frustaci, Fabio
    Pettit, Stefania
    Corsonello, Pasquale
    2020 27TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2020,
  • [24] Design of a High-performance Prognostics and Health Management Platform for Weapon Equipment
    Si, Shuhao
    Jing, Bo
    Wang, Yun
    Jiao, Xiaoxuan
    Sun, Meng
    2018 PROGNOSTICS AND SYSTEM HEALTH MANAGEMENT CONFERENCE (PHM-CHONGQING 2018), 2018, : 894 - 899
  • [25] A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAs
    Liu, Hongduo
    Qian, Yijian
    Liang, Youqiang
    Zhang, Bin
    Liu, Zhaohan
    He, Tao
    Zhao, Wenqian
    Lu, Jiangbo
    Yu, Bei
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (03)
  • [26] FPGA-based hardware accelerator for high-performance data-stream processing
    Lysakov K.F.
    Shadrin M.Y.
    Pattern Recognition and Image Analysis, 2013, 23 (1) : 26 - 34
  • [27] FPGA-Based High-Performance Data Compression Deep Neural Network Accelerator
    Wang, Hanze
    Fu, Yingxun
    Ma, Li
    2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 563 - 569
  • [28] A High-performance Web Attack Detection Method based on CNN-GRU Model
    Niu, Qiangqiang
    Li, Xiaoyong
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 804 - 808
  • [29] FACL: A Flexible and High-Performance ACL engine on FPGA-based SmartNIC
    Jia, Chengjun
    Li, Chenglong
    Li, Yifan
    Hu, Xiaohe
    Li, Jun
    2022 IFIP NETWORKING CONFERENCE (IFIP NETWORKING), 2022,
  • [30] UTPlaceF 2.0: A High-Performance Clock-Aware FPGA Placement Engine
    Li, Wuxi
    Lin, Yibo
    Li, Meng
    Dhar, Shounak
    Pan, David Z.
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2018, 23 (04)