A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks

被引:68
作者
Price, Michael [1 ]
Glass, James [1 ]
Chandrakasan, Anantha P. [1 ]
机构
[1] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
CMOS digital integrated circuits; deep neural networks (DNNs); speech recognition; voice activity detection (VAD); weighted finite-state transducers (WFSTs); MEMORY;
D O I
10.1109/JSSC.2017.2752838
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes digital circuit architectures for automatic speech recognition (ASR) and voice activity detection (VAD) with improved accuracy, programmability, and scalability. Our ASR architecture is designed to minimize off- chip memory bandwidth, which is the main driver of system power consumption. A SIMD processor with 32 parallel execution units efficiently evaluates feed-forward deep neural networks (NNs) for ASR, limiting memory usage with a sparse quantized weight matrix format. We argue that VADs should prioritize accuracy over area and power, and introduce a VAD circuit that uses an NN to classify modulation frequency features with 22.3-mu W power consumption. The 65-nm test chip is shown to perform a variety of ASR tasks in real time, with vocabularies ranging from 11 words to 145 000 words and full-chip power consumption ranging from 172 mu W to 7.78 mW.
引用
收藏
页码:66 / 75
页数:10
相关论文
共 30 条
[1]  
[Anonymous], 1997, Statistical methods for speech recognition
[2]  
[Anonymous], 1993, IFA P, DOI DOI 10.1371/JOURNAL.PONE.0069107
[3]  
[Anonymous], 2009, P 42 ANN IEEE ACM IN, DOI DOI 10.1145/1669112.1669118
[4]   A 90 nm CMOS, 6 μW Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection [J].
Badami, Komail M. H. ;
Lauwereins, Steven ;
Meert, Wannes ;
Verhelst, Marian .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2016, 51 (01) :291-302
[5]   A Generic and Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator Using Logic on Memory [J].
Bapat, Ojas A. ;
Franzon, Paul D. ;
Fastow, Richard M. .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2014, 22 (12) :2701-2712
[6]  
Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007
[7]  
Chuangsuwanich E., 2011, INTERSPEECH, P2645
[8]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[9]  
Godfrey J. J., 1992, ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing (Cat. No.92CH3103-9), P517, DOI 10.1109/ICASSP.1992.225858
[10]  
He GJ, 2013, IEEE WORKSHOP SIG, P147