Acceleration of LSTM With Structured Pruning Method on FPGA

被引:34
作者
Wang, Shaorun [1 ]
Lin, Peng [1 ]
Hu, Ruihan [1 ]
Wang, Hao [1 ]
He, Jin [1 ]
Huang, Qijun [1 ]
Chang, Sheng [1 ]
机构
[1] Wuhan Univ, Sch Phys & Technol, Wuhan 430072, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
FPGA; hardware acceleration; LSTM; pruning; EFFICIENT;
D O I
10.1109/ACCESS.2019.2917312
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper focuses on accelerating long short-term memory (LSTM), which is one of the popular types of recurrent neural networks (RNNs). Because of the large number of weight memory accesses and high computation complexity with the cascade-dependent structure, it is a big challenge to efficiently implement the LSTM on field-programmable gate arrays (FPGAs). To speed up the inference on FPGA, considering its limited resource, a structured pruning method that can not only reduce the LSTM model's size without loss of prediction accuracy but also eliminate the imbalance computation and irregular memory accesses is proposed. Besides that, the hardware architecture of the compressed LSTM is designed to pursue high performance. As a result, the implementation of an LSTM language module on Stratix V GXA7 FPGA can achieve 85.2 GOPS directly on the sparse LSTM network by our method, corresponding to 681.6-GOPS effective throughput on the dense one, which shows that the proposed structured pruning algorithm makes 7.82 times speedup when only 1/8 parameters are reserved. We hope that our method can give an efficient way to accelerate the LSTM and similar recurrent neural networks when the resource-limited environment is emphasized.
引用
收藏
页码:62930 / 62937
页数:8
相关论文
共 27 条
[1]  
[Anonymous], RECURRENT NEURAL NET
[2]  
[Anonymous], OPENCL SPEC VERS 1 1
[3]  
[Anonymous], P INT S FIELD PROGR
[4]  
[Anonymous], 2016, 2016 IEEE INT, DOI DOI 10.1109/SiPS.2016.48
[5]  
[Anonymous], P INT S FIELD PROGR
[6]  
[Anonymous], 2015, ARXIV PREPRINT ARXIV
[7]  
[Anonymous], 2014, P ANN C NEUR INF PRO, DOI [DOI 10.1021/acs.analchem.7b05329, DOI 10.48550/ARXIV.1409.3215]
[8]  
[Anonymous], 6 ICLR
[9]   Scene analysis by mid-level attribute learning using 2D LSTM networks and an application to web-image tagging [J].
Byeon, Wonmin ;
Liwicki, Marcus ;
Breuel, Thomas M. .
PATTERN RECOGNITION LETTERS, 2015, 63 :23-29
[10]   Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity [J].
Cao, Shijie ;
Zhang, Chen ;
Yao, Zhuliang ;
Xiao, Wencong ;
Nie, Lanshun ;
Zhan, Dechen ;
Liu, Yunxin ;
Wu, Ming ;
Zhang, Lintao .
PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, :63-72