A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks

被引：68

作者：

Price, Michael ^{[1
]}

Glass, James ^{[1
]}

Chandrakasan, Anantha P. ^{[1
]}

机构：

[1] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

IEEE JOURNAL OF SOLID-STATE CIRCUITS | 2018年 / 53卷 / 01期

关键词：

CMOS digital integrated circuits; deep neural networks (DNNs); speech recognition; voice activity detection (VAD); weighted finite-state transducers (WFSTs); MEMORY;

D O I：

10.1109/JSSC.2017.2752838

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper describes digital circuit architectures for automatic speech recognition (ASR) and voice activity detection (VAD) with improved accuracy, programmability, and scalability. Our ASR architecture is designed to minimize off- chip memory bandwidth, which is the main driver of system power consumption. A SIMD processor with 32 parallel execution units efficiently evaluates feed-forward deep neural networks (NNs) for ASR, limiting memory usage with a sparse quantized weight matrix format. We argue that VADs should prioritize accuracy over area and power, and introduce a VAD circuit that uses an NN to classify modulation frequency features with 22.3-mu W power consumption. The 65-nm test chip is shown to perform a variety of ASR tasks in real time, with vocabularies ranging from 11 words to 145 000 words and full-chip power consumption ranging from 172 mu W to 7.78 mW.

引用

页码：66 / 75

页数：10

共 30 条

[1]

[Anonymous], 1997, Statistical methods for speech recognition

[2]

[Anonymous], 1993, IFA P, DOI DOI 10.1371/JOURNAL.PONE.0069107

[3]

[Anonymous], 2009, P 42 ANN IEEE ACM IN, DOI DOI 10.1145/1669112.1669118

[4] A 90 nm CMOS, 6 μW Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection [J].

Badami, Komail M. H. ;

Lauwereins, Steven ;

Meert, Wannes ;

Verhelst, Marian .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2016, 51 (01) :291-302

[5] A Generic and Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator Using Logic on Memory [J].

Bapat, Ojas A. ;

Franzon, Paul D. ;

Fastow, Richard M. .

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2014, 22 (12) :2701-2712

[6]

Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007

[7]

Chuangsuwanich E., 2011, INTERSPEECH, P2645

[8] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[9]

Godfrey J. J., 1992, ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing (Cat. No.92CH3103-9), P517, DOI 10.1109/ICASSP.1992.225858

[10]

He GJ, 2013, IEEE WORKSHOP SIG, P147

← 1 2 3 →