Kronecker CP Decomposition With Fast Multiplication for Compressing RNNs

被引:14
作者
Wang, Dingheng [1 ]
Wu, Bijiao [1 ]
Zhao, Guangshe [1 ]
Yao, Man [1 ]
Chen, Hengnu [2 ,3 ]
Deng, Lei [2 ,3 ]
Yan, Tianyi [4 ]
Li, Guoqi [2 ,3 ]
机构
[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Sch Automat Sci & Engn, Xian 710049, Peoples R China
[2] Tsinghua Univ, Ctr Brain Inspired Comp Res, Dept Precis Instrumentat, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Beijing Innovat Ctr Future Chip, Beijing 100084, Peoples R China
[4] Beijing Inst Technol, Sch Life Sci, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Tensors; Recurrent neural networks; Matrix decomposition; Computational complexity; Topology; Sparse matrices; Task analysis; Fast multiplication; Kronecker CP decomposition; Kronecker tensor (KT) decomposition; network compression; recurrent neural networks (RNNs); NEURAL-NETWORK ARCHITECTURES; TENSOR DECOMPOSITION; ACCELERATION; LSTM;
D O I
10.1109/TNNLS.2021.3105961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recurrent neural networks (RNNs) are powerful in the tasks oriented to sequential data, such as natural language processing and video recognition. However, because the modern RNNs have complex topologies and expensive space/computation complexity, compressing them becomes a hot and promising topic in recent years. Among plenty of compression methods, tensor decomposition, e.g., tensor train (TT), block term (BT), tensor ring (TR), and hierarchical Tucker (HT), appears to be the most amazing approach because a very high compression ratio might be obtained. Nevertheless, none of these tensor decomposition formats can provide both space and computation efficiency. In this article, we consider to compress RNNs based on a novel Kronecker CANDECOMP/PARAFAC (KCP) decomposition, which is derived from Kronecker tensor (KT) decomposition, by proposing two fast algorithms of multiplication between the input and the tensor-decomposed weight. According to our experiments based on UCF11, Youtube Celebrities Face, UCF50, TIMIT, TED-LIUM, and Spiking Heidelberg digits datasets, it can be verified that the proposed KCP-RNNs have a comparable performance of accuracy with those in other tensor-decomposed formats, and even 2,78,219x compression ratio could be obtained by the low-rank KCP. More importantly, KCP-RNNs are efficient in both space and computation complexity compared with other tensor-decomposed ones. Besides, we find KCP has the best potential of parallel computing to accelerate the calculations in neural networks.
引用
收藏
页码:2205 / 2219
页数:15
相关论文
共 66 条
[1]   Speaker-Adapted Confidence Measures for ASR Using Deep Bidirectional Recurrent Neural Networks [J].
Angel Del-Agua, Miguel ;
Gimenez, Adria ;
Sanchis, Albert ;
Civera, Jorge ;
Juan, Alfons .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (07) :1194-1202
[2]  
Phan AH, 2013, INT CONF ACOUST SPEE, P3228, DOI 10.1109/ICASSP.2013.6638254
[3]  
Anh Huy Phan, 2012, Latent Variable Analysis and Signal Separation. Proceedings 10th International Conference, LVA/ICA 2012, P297, DOI 10.1007/978-3-642-28551-6_37
[4]  
[Anonymous], 2016, ARXIV160706999
[5]  
[Anonymous], 2014, EMNLP, DOI DOI 10.3115/V1
[6]  
Arnab A, 2021, P IEEE CVF INT C COM
[7]  
Astrid M, 2017, INT CONF BIG DATA, P115, DOI 10.1109/BIGCOMP.2017.7881725
[8]  
Banerjee B, 2017, INT CONF ACOUST SPEE, P2637, DOI 10.1109/ICASSP.2017.7952634
[9]  
Bansal S, 2019, 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), P494, DOI [10.1109/ASRU46091.2019.9004000, 10.1109/asru46091.2019.9004000]
[10]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166