LSTM Acceleration with FPGA and GPU Devices for Edge Computing Applications in B5G MEC

被引：2

作者：

Danopoulos, Dimitrios ^{[1
]}

Stamoulias, Ioannis ^{[1
,2
]}

Lentaris, George ^{[1
]}

Masouros, Dimosthenis ^{[1
]}

Kanaropoulos, Ioannis ^{[1
]}

Kakolyris, Andreas Kosmas ^{[1
]}

Soudris, Dimitrios ^{[1
]}

机构：

[1] Natl Tech Univ Athens, Athens, Greece

[2] Natl & Kapodistrian Univ Athens, Athens, Greece

来源：

EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2022 | 2022年 / 13511卷

基金：

欧盟地平线“2020”;

关键词：

5G; Forecasting; Anomaly detection; LSTM; FPGA; GPU;

D O I：

10.1007/978-3-031-15074-6_26

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The advent of AI/ML in B5G and Multi-Access Edge Computing will rely on the acceleration of neural networks. The current work focuses on the acceleration of Long Short-Term Memory (LSTM) kernels playing a key role in numerous applications. We assume various LSTM sizes while targeting FPGA and GPU hardware for both embedded and server MEC purposes. Systematically, we perform a design space exploration to determine the most efficient acceleration approach and most suitable configuration for each device. We use High-Level-Synthesis to implement our proposed circuit architectures on Xilinx FPGAs, while we use high level tools for NVIDIA GPUs such as PyTorch's JIT compiler or ONNX runtime. Our exploration shows that the full parallelization of an LSTM array multiplication quickly overutilizes the FPGA, while on GPUs LSTM models can be deployed more easily. Instead, the best approach for FPGAs is to find a balance between parallelizing LSTM gates and vector multiplications. Our comparative study shows that FPGAs prevail in light LSTM models, whereas GPUs prevail in larger model topologies. Moreover, we show that far- and near-edge FPGAs achieve similar latency, however, near-edge GPUs can achieve one order of magnitude faster execution than far-edge GPUs. The best results range in 0.3-5 ms latency per execution with acceleration factors in 12x-174x.

引用

页码：406 / 419

页数：14

共 20 条

[1]

Bank D., 2020, arXiv

[2]

Chanana Ashish, 2017, 2017 42nd International Conference on Infrared, Millimeter and Terahertz Waves (IRMMW-THz), DOI 10.1109/IRMMW-THz.2017.8067214

[3]

Chang A.X. M., 2015, Comput. Sci.

[4]

Diamanti A., 2020, P 32 INT TEL C, P28, DOI DOI 10.1109/ITC3249928.2020.00012

[5] Unsupervised Anomaly Detection With LSTM Neural Networks [J].

Ergen, Tolga ;

Kozat, Suleyman Serdar .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) :3127-3141

[6]

EU, 2022, H2020 PROJ AIEDGE

[7] A Configurable Cloud-Scale DNN Processor for Real-Time AI [J].

Fowers, Jeremy ;

Ovtcharov, Kalin ;

Papamichael, Michael ;

Massengill, Todd ;

Liu, Ming ;

Lo, Daniel ;

Alkalay, Shlomi ;

Haselman, Michael ;

Adams, Logan ;

Ghandi, Mahdi ;

Heil, Stephen ;

Patel, Prerak ;

Sapek, Adam ;

Weisz, Gabriel ;

Woods, Lisa ;

Lanka, Sitaram ;

Reinhardt, Steven K. ;

Caulfield, Adrian M. ;

Chung, Eric S. ;

Burger, Doug .

2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :1-14

[8] An Autocorrelation-based LSTM-Autoencoder for Anomaly Detection on Time-Series Data [J].

Homayouni, Hajar ;

Ghosh, Sudipto ;

Ray, Indrakshi ;

Gondalia, Shlok ;

Duggan, Jerry ;

Kahn, Michael G. .

2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, :5068-5077

[9]

Nvidia, 2022, JETS AGX XAV DEV KIT

[10]

Nvidia, 2022, NVIDIA CUDA 10

← 1 2 →