XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine

被引：2

作者：

Jia, Xijie ^{[1
]}

Zhang, Yu ^{[1
]}

Liu, Guangdong ^{[1
]}

Yang, Xinlin ^{[1
]}

Zhang, Tianyu ^{[1
]}

Zheng, Jia ^{[1
]}

Xu, Dongdong ^{[1
]}

Liu, Zhuohuan ^{[1
]}

Liu, Mengke ^{[1
]}

Yan, Xiaoyang ^{[1
]}

Wang, Hong ^{[1
]}

Zheng, Rongzhang ^{[1
]}

Wang, Li ^{[1
]}

Li, Dong ^{[1
]}

Pareek, Satyaprakash ^{[1
]}

Weng, Jian ^{[1
]}

Tian, Lu ^{[1
]}

Xie, Dongliang ^{[1
]}

Luo, Hong ^{[1
]}

Shan, Yi ^{[2
]}

机构：

[1] AMD, 15F Block B China Overseas Int Ctr,Bldg 5 5 Yard, Beijing 100029, Peoples R China

[2] PhiGent Robot, 25F,Tower B,Tsinghua Tongfang High Tech Plaza,1 W, Beijing 100083, Peoples R China

来源：

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS | 2024年 / 17卷 / 02期

关键词：

ACAP; acceleration; AI Engine; ALU engine; CNN; FPGA; hardware heterogeneous architecture; Versal; IMAGE SUPERRESOLUTION;

D O I：

10.1145/3617836

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Today, convolutional neural networks (CNNs) are widely used in computer vision applications. However, the trends of higher accuracy and higher resolution generate larger networks. The requirements of computation or I/O are the key bottlenecks. In this article, we propose XVDPU: the AI Engine (AIE)-based CNN accelerator on Versal chips to meet heavy computation requirements. To resolve the IO bottleneck, we adopt several techniques to improve data reuse and reduce I/O requirements. An arithmetic logic unit is further proposed that can better balance resource utilization, new feature support, and efficiency of the whole system. We have successfully deployed more than 100 CNN models with our accelerator. Our experimental results show that the 96-AIE-core implementation can achieve 1,653 frames per second (FPS) for ResNet50 on VCK190, which is 9.8x faster than the design on ZCU102 running at 168.5 FPS. The 256-AIE-core implementation can further achieve 4,050 FPS. We propose a tilling strategy to achieve feature-map-stationary for high-definition CNN with the accelerator, achieving 3.8x FPS improvement on the residual channel attention network and 3.1x on super-efficient super-resolution. This accelerator can also solve the 3D convolution task in disparity estimation, achieving end-to-end performance of 10.1 FPS with all the optimizations.

引用

页数：24

共 50 条

[21] A High-Performance and Flexible Architecture for Accelerating SDN on the MPSoC Platform
Sha, Meng
Guo, Zhichuan
Guo, Yunfei
Zeng, Xuewen
MICROMACHINES, 2022, 13 (11)
[22] High-performance Convolutional Neural Network Accelerator Based on Systolic Arrays and Quantization
Li, Yufeng
Lu, Shengli
Luo, Jihe
Pang, Wei
Liu, Hao
2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 335 - 339
[23] A High-Performance and Power-Efficient SIMD Convolution Engine for FPGAs
Spagnolo, Fanny
Frustaci, Fabio
Pettit, Stefania
Corsonello, Pasquale
2020 27TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2020,
[24] Design of a High-performance Prognostics and Health Management Platform for Weapon Equipment
Si, Shuhao
Jing, Bo
Wang, Yun
Jiao, Xiaoxuan
Sun, Meng
2018 PROGNOSTICS AND SYSTEM HEALTH MANAGEMENT CONFERENCE (PHM-CHONGQING 2018), 2018, : 894 - 899
[25] A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAs
Liu, Hongduo
Qian, Yijian
Liang, Youqiang
Zhang, Bin
Liu, Zhaohan
He, Tao
Zhao, Wenqian
Lu, Jiangbo
Yu, Bei
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (03)
[26] FPGA-based hardware accelerator for high-performance data-stream processing
Lysakov K.F.
Shadrin M.Y.
Pattern Recognition and Image Analysis, 2013, 23 (1) : 26 - 34
[27] FPGA-Based High-Performance Data Compression Deep Neural Network Accelerator
Wang, Hanze
Fu, Yingxun
Ma, Li
2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 563 - 569
[28] A High-performance Web Attack Detection Method based on CNN-GRU Model
Niu, Qiangqiang
Li, Xiaoyong
PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 804 - 808
[29] FACL: A Flexible and High-Performance ACL engine on FPGA-based SmartNIC
Jia, Chengjun
Li, Chenglong
Li, Yifan
Hu, Xiaohe
Li, Jun
2022 IFIP NETWORKING CONFERENCE (IFIP NETWORKING), 2022,
[30] UTPlaceF 2.0: A High-Performance Clock-Aware FPGA Placement Engine
Li, Wuxi
Lin, Yibo
Li, Meng
Dhar, Shounak
Pan, David Z.
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2018, 23 (04)

← 1 2 3 4 5 →