A Cost-Efficient FPGA-Based CNN-Transformer Using Neural ODE

被引：0

作者：

Okubo, Ikumi ^{[1
]}

Sugiura, Keisuke ^{[1
]}

Matsutani, Hiroki ^{[1
]}

机构：

[1] Keio Univ, Grad Sch Sci & Technol, Yokohama 2238522, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

日本学术振兴会;

关键词：

Transformers; Field programmable gate arrays; Computational modeling; Accuracy; Attention mechanisms; Quantization (signal); Costs; Training; Load modeling; Mathematical models; Artificial intelligence; machine learning; tiny machine learning;

D O I：

10.1109/ACCESS.2024.3480977

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Transformer has been adopted to image recognition tasks and shown to outperform CNNs and RNNs while it suffers from high training cost and computational complexity. To address these issues, a hybrid approach has become a recent research trend, which replaces a part of ResNet with an MHSA (Multi-Head Self-Attention). In this paper, we propose a lightweight hybrid model which uses Neural ODE (Ordinary Differential Equation) as a backbone instead of ResNet so that we can increase the number of iterations of building blocks while reusing the same parameters, mitigating the increase in parameter size per iteration. The proposed model is deployed on a modest-sized FPGA device for edge computing. The model is further quantized by QAT (Quantization Aware Training) scheme to reduce FPGA resource utilization while suppressing the accuracy loss. The quantized model achieves 79.68% top-1 accuracy for STL10 dataset that contains 96x96 pixel images. The weights of the feature extraction network are stored on-chip to minimize the memory transfer overhead, allowing faster inference. By eliminating the overhead of memory transfers, inference can be executed seamlessly, leading to accelerated inference. The proposed FPGA implementation accelerates the backbone and MHSA parts by 34.01x , and achieves an overall 9.85x speedup when taking into account the software pre- and post-processing. The FPGA acceleration leads to 7.10x better energy efficiency compared to the ARM Cortex-A53 CPU. The proposed lightweight Transformer model is demonstrated on Xilinx ZCU104 board for the image recognition of 96x96 pixel images in this paper and can be applied to different image sizes by modifying the pre-processing layer.

引用

页码：155773 / 155788

页数：16

共 49 条

[1]

[Anonymous], 2015, 3 INT C LEARN REPR I

[2] A Tiny Transformer-Based Anomaly Detection Framework for IoT Solutions [J].

Barbieri, Luca ;

Brambilla, Mattia ;

Stefanutti, Mario ;

Romano, Ciro ;

De Carlo, Niccolo ;

Roveri, Manuel .

IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2023, 4 :462-478

[3] Attention Augmented Convolutional Networks [J].

Bello, Irwan ;

Zoph, Barret ;

Vaswani, Ashish ;

Shlens, Jonathon ;

Le, Quoc V. .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294

[4]

Chen RTQ, 2018, 32 C NEURAL INFORM P, V31

[5]

Chen X, 2023, AAAI CONF ARTIF INTE, P378

[6] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[7]

Coates A., 2011, Proceedings of Machine Learning Research, P215

[8]

Dai Z., 2021, arXiv, DOI DOI 10.48550/ARXIV.2106.04803

[9]

Dao T, 2022, ADV NEUR IN

[10] Automated Deep Transfer Learning-Based Approach for Detection of COVID-19 Infection in Chest X-rays [J].

Das, N. Narayan ;

Kumar, N. ;

Kaur, M. ;

Kumar, V ;

Singh, D. .

IRBM, 2022, 43 (02) :114-119

← 1 2 3 4 5 →