FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA

被引：19

作者：

Basalama, Suhail ^{[1
]}

Sohrabizadeh, Atefeh ^{[1
]}

Wang, Jie ^{[1
]}

Guo, Licheng ^{[1
]}

Cong, Jason ^{[1
]}

机构：

[1] Univ Calif Los Angeles, 404 Westwood Blvd Engn,6 Room 468, Los Angeles, CA 90095 USA

来源：

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS | 2023年 / 16卷 / 02期

关键词：

FPGA; CNN; ONNX; systolic array; transposed convolution; dilated convolution; OpenPose; U-Net; E-Net;

D O I：

10.1145/3570928

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With reduced data reuse and parallelism, recent convolutional neural networks (CNNs) create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable architectures for convolutional layers, but without proper optimizations, their efficiency drops dramatically for reasons: (1) the different dimensions within same-type layers, (2) the different convolution layers especially transposed and dilated convolutions, and (3) CNN's complex dataflow graph. Furthermore, significant overheads arise when integrating FPGAs into machine learning frameworks. Therefore, we present a flexible, composable architecture called FlexCNN, which delivers high computation efficiency by employing dynamic tiling, layer fusion, and data layout optimizations. Additionally, we implement a novel versatile SA to process normal, transposed, and dilated convolutions efficiently. FlexCNN also uses a fully pipelined software-hardware integration that alleviates the software overheads. Moreover, with an automated compilation flow, FlexCNN takes a CNN in the ONNX1 representation, performs a design space exploration, and generates an FPGA accelerator. The framework is tested using three complex CNNs: OpenPose, U-Net, and E-Net. The architecture optimizations achieve 2.3x performance improvement. Compared to a standard SA, the versatile SA achieves close-to-ideal speedups, with up to 15.98x and 13.42x for transposed and dilated convolutions, with a 6% average area overhead. The pipelined integration leads to a 5x speedup for OpenPose.

引用

页数：32

共 50 条

[21] End-to-End Synthesis of Dynamically Controlled Machine Learning Accelerators
Curzel, Serena
Agostini, Nicolas Bohm
Castellana, Vito Giovanni
Minutoli, Marco
Limaye, Ankur
Manzano, Joseph
Zhang, Jeff
Brooks, David
Wei, Gu-Yeon
Ferrandi, Fabrizio
Tumeo, Antonino
IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (12) : 3074 - 3087
[22] An approach to the identification of network elements composing heterogeneous end-to-end paths
Botta, Alessio
Pescape, Antonio
Ventre, Giorgio
COMPUTER NETWORKS, 2008, 52 (15) : 2975 - 2987
[23] PiDRAM: An FPGA-based Framework for End-to-end Evaluation of Processing-in-DRAM Techniques
Olgun, Ataberk
Luna, Juan Gomez
Kanellopoulos, Konstantinos
Salami, Behzad
Hassan, Hasan
Ergin, Oguz
Mutlu, Onur
2022 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2022), 2022, : 267 - 272
[24] A NOVEL BOVW MIMICKING END-TO-END TRAINABLE CNN CLASSIFICATION FRAMEWORK USING OPTIMAL TRANSPORT THEORY
Gurbuz, Yeti Z.
Alatan, A. Aydin
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3053 - 3057
[25] JOINT VERIFICATION-IDENTIFICATION IN END-TO-END MULTI-SCALE CNN FRAMEWORK FOR TOPIC IDENTIFICATION
Pappagari, Raghavendra
Villalba, Jesus
Dehak, Najim
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6199 - 6203
[26] GETNET: A General End-to-End 2-D CNN Framework for Hyperspectral Image Change Detection
Wang, Qi
Yuan, Zhenghang
Du, Qian
Li, Xuelong
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (01): : 3 - 13
[27] Evaluation of end-to-end CNN models for palm vein recognition
Santamaria, Jose, I
Hernandez-Garcia, Ruber
Barrientos, Ricardo J.
Manuel Castro, Francisco
Ramos-Cozar, Julian
Guil, Nicolas
2021 40TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2021,
[28] End-to-End Mandarin Speech Recognition Combining CNN and BLSTM
Wang, Dong
Wang, Xiaodong
Lv, Shaohe
SYMMETRY-BASEL, 2019, 11 (05):
[29] End-to-End Cascade CNN for Simultaneously Face Detection and Alignment
Zhao, Sanyuan
Song, Hongmei
Cong, Weilin
Qi, Qi
Tian, Hui
2017 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2017), 2017, : 35 - 40
[30] CNN-based End-to-End Learning for Lane Centering
Ebu, Iffat Ara
Islam, Fahmida
Ball, John E.
Goodin, Christopher T.
AUTONOMOUS SYSTEMS:SENSORS, PROCESSING, AND SECURITY FOR GROUND, AIR, SEA, AND SPACE VEHICLES AND INFRASTRUCTURE 2024, 2024, 13052

← 1 2 3 4 5 →