Energy Efficient Memory-based Inference of LSTM by Exploiting FPGA Overlay

被引:0
作者
Guha, Krishnendu [1 ]
Trivedi, Amit Ranjan [2 ]
Bhunia, Swarup [3 ]
机构
[1] Univ Coll Cork, Sch Comp Sci & Informat Technol, Cork, Ireland
[2] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL USA
[3] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
来源
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年
关键词
LSTM; ML; FPGA; Memory-based Mapping; Energy Efficiency; Computing with Memory; ARCHITECTURE;
D O I
10.1109/IJCNN54540.2023.10191667
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The fourth industrial revolution (a.k.a. Industry 4.0) relies on intelligent machines that are fully autonomous and can diagnose and resolve operational issues without human intervention. Therefore, embedded computing platforms enabling the necessary computations for intelligent machines are critical for the ongoing industrial revolution. Especially field programmable gate arrays (FPGAs) are highly suited for such embedded computing due to their high performance and easy reconfigurability. Many Industry 4.0 applications, such as predictive maintenance, critically depend on real-time and reliable processing of time-series data using recurrent neural network models, especially long short-term memory (LSTM). Therefore, the FPGA-based acceleration of LSTM is imperative for many Industry 4.0 applications. Existing LSTM models for FPGAs incur significant resources and power and are not energy efficient. Moreover, prior works focusing on reducing latency and power mainly adhere to model pruning, which compromises the accuracy. Comparatively, we propose a memory-based energy-efficient inference of LSTM by exploiting overlay in FPGA. In our methodology, we pre-compute predominant operations and store them in the available embedded memory blocks (EMBs) of an FPGA. On-demand, these pre-computed results are accessed to minimize the necessary workload. Via this methodology, we obtained lower latency, lower power, and better energy efficiency than state-of-the-art LSTM models without any loss of accuracy. Specifically, when implemented on the ZynQ XCU104 evaluation board, a 3x reduction in latency and 5x reduction in power is obtained then the reference 16-bit LSTM model.
引用
收藏
页数:7
相关论文
共 21 条
[1]  
Ahir G., 2020, LONG SHORT TERM MEMO
[2]  
[Anonymous], 2020, 2020 IEEE INT S CIRC, DOI DOI 10.1007/978-981-32-9088-4_1
[3]   Privacy-Aware Resource Sharing in Cross-Device Federated Model Training for Collaborative Predictive Maintenance [J].
Bharti, Sourabh ;
Mcgibney, Alan .
IEEE ACCESS, 2021, 9 (09) :120367-120379
[4]  
Brunelli C, 2009, IEEE WRK SIG PRO SYS, P57, DOI 10.1109/SIPS.2009.5336225
[5]   Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity [J].
Cao, Shijie ;
Zhang, Chen ;
Yao, Zhuliang ;
Xiao, Wencong ;
Nie, Lanshun ;
Zhan, Dechen ;
Liu, Yunxin ;
Wu, Ming ;
Zhang, Lintao .
PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, :63-72
[6]   Multipartite table methods [J].
de Dinechin, F ;
Tisserand, A .
IEEE TRANSACTIONS ON COMPUTERS, 2005, 54 (03) :319-330
[7]  
Durbhaka GK, 2016, 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), P1839, DOI 10.1109/ICACCI.2016.7732316
[8]   Improving Energy Efficiency in FPGA Through Judicious Mapping of Computation to Embedded Memory Blocks [J].
Ghosh, Anandaroop ;
Paul, Somnath ;
Park, Jongsun ;
Bhunia, Swarup .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2014, 22 (06) :1314-1327
[9]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[10]   Stretching the Edges of SVM Traffic Classification With FPGA Acceleration [J].
Groleat, Tristan ;
Arzel, Matthieu ;
Vaton, Sandrine .
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2014, 11 (03) :278-291