Grow and Prune Compact, Fast, and Accurate LSTMs

被引：59

作者：

Dai, Xiaoliang ^{[1
]}

Yin, Hongxu ^{[1
]}

Jha, Niraj K. ^{[1
]}

机构：

[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2020年 / 69卷 / 03期

关键词：

Deep learning; grow-and-prune training; long short-term memory; neural network;

D O I：

10.1109/TC.2019.2954495

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Long short-term memory (LSTM) has been widely used for sequential data modeling. Researchers have increased LSTM depth by stacking LSTM cells to improve performance. This incurs model redundancy, increases run-time delay, and makes the LSTMs more prone to overfitting. To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM's original one-level nonlinear control gates. H-LSTM increases accuracy while employing fewer external stacked layers, thus reducing the number of parameters and run-time latency significantly. We employ grow-and-prune (GP) training to iteratively adjust the hidden layers through gradient-based growth and magnitude-based pruning of connections. This learns both the weights and the compact architecture of H-LSTM control gates. We have GP-trained H-LSTMs for image captioning, speech recognition, and neural machine translation applications. For the NeuralTalk architecture on the MSCOCO dataset, our three models reduce the number of parameters by 38.7x [floating-point operations (FLOPs) by 45.5x], run-time latency by 4.5x, and improve the CIDEr-D score by 2.8 percent, respectively. For the DeepSpeech2 architecture on the AN4 dataset, the first model we generated reduces the number of parameters by 19.4x and run-time latency by 37.4 percent. The second model reduces the word error rate (WER) from 12.9 to 8.7 percent. For the encoder-decoder sequence-to-sequence network on the IWSLT 2014 German-English dataset, the first model we generated reduces the number of parameters by 10.8x and run-time latency by 14.2 percent. The second model increases the BLEU score from 30.02 to 30.98. Thus, GP-trained H-LSTMs can be seen to be compact, fast, and accurate.

引用

页码：441 / 452

页数：12

共 58 条

[1]

Akmandor AO, 2018, IEEE CUST INTEGR CIR

[2] Smart, Secure, Yet Energy-Efficient, Internet-of-Things Sensors [J].

Akmandor, Ayten Ozge ;

Yin, Hongxu ;

Jha, Niraj K. .

IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, 2018, 4 (04) :914-930

[3]

Alistarh D, 2017, ADV NEURAL INF PROCE, P1709

[4]

Amodei D, 2016, PR MACH LEARN RES, V48

[5]

[Anonymous], P INT C LEARN REPR

[6]

[Anonymous], 2014, P BRIT MACH VIS C

[7]

[Anonymous], P INT C AC SPEECH SI

[8]

[Anonymous], 2011, PROC DEEP LEARN UNS

[9]

[Anonymous], 2013, ADV NEURAL INFORM PR

[10]

[Anonymous], IMAGE CAPTIONING TOR

← 1 2 3 4 5 6 →