Efficient Memory Compression in Deep Neural Networks Using Coarse-Grain Sparsification for Speech Applications

被引:21
作者
Kadetotad, Deepak [1 ]
Arunachalam, Sairam [2 ]
Chakrabarti, Chaitali [1 ]
Seo, Jae-sun [1 ]
机构
[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA
[2] Arizona State Univ, Sch Comp Informat Decis Syst Engn, Tempe, AZ 85281 USA
来源
2016 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) | 2016年
基金
美国国家科学基金会;
关键词
Speech recognition; keyword detection; memory compression; low power design; deep neural networks;
D O I
10.1145/2966986.2967028
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recent breakthroughs in deep neural networks have led to the proliferation of its use in image and speech applications. Conventional deep neural networks (DNNs) are fully-connected multi-layer networks with hundreds or thousands of neurons in each layer. Such a network requires a very large weight memory to store the connectivity between neurons. In this paper, we propose a hardware-centric methodology to design low power neural networks with significantly smaller memory footprint and computation resource requirements. We achieve this by judiciously dropping connections in large blocks of weights. The corresponding technique, termed coarse-grain sparsification (CGS), introduces hardware-aware sparsity during the DNN training, which leads to efficient weight memory compression and significant computation reduction during classification without losing accuracy. We apply the proposed approach to DNN design for keyword detection and speech recognition. When the two DNNs are trained with 75% of the weights dropped and classified with 56 bit weight precision, the weight memory requirement is reduced by 95% compared to their fully-connected counterparts with double precision, while maintaining similar performance in keyword detection accuracy, word error rate, and sentence error rate. To validate this technique in real hardware, a time-multiplexed architecture using a shared multiply and accumulate (MAC) engine was implemented in 65nm and 40nm low power (LP) CMOS. In 40nm at 0.6V, the keyword detection network consumes 7 mu W and the speech recognition network consumes 103 mu W, making this technique highly suitable for mobile and wearable devices.
引用
收藏
页数:8
相关论文
共 23 条
[1]  
[Anonymous], 2011, IEEE 2011 WORKSHOP A
[2]  
[Anonymous], 2013, P INTERSPEECH
[3]  
[Anonymous], 2015, 16 ANN C INT SPEECH
[4]  
[Anonymous], P ICML
[5]  
[Anonymous], 2014, ARXIV14016984
[6]  
[Anonymous], 2016, P ICLR
[7]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[8]   PARTIALLY CONNECTED MODELS OF NEURAL NETWORKS [J].
CANNING, A ;
GARDNER, E .
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1988, 21 (15) :3275-3284
[9]  
Courbariaux Matthieu, 2015, CoRR
[10]  
Deng L, 2013, INT CONF ACOUST SPEE, P8604, DOI 10.1109/ICASSP.2013.6639345