Vocell: A 65-nm Speech-Triggered Wake-Up SoC for 10-μW Keyword Spotting and Speaker Verification

被引:66
作者
Giraldo, Juan Sebastian P. [1 ]
Lauwereins, Steven [2 ]
Badami, Komail [3 ]
Verhelst, Marian [1 ]
机构
[1] KU Leuven ICTS, Dept Elect Engn, B-3001 Heverlee, Belgium
[2] Televic Rail, B-8870 Izegem, Belgium
[3] CSEM Zurich, CH-8005 Zurich, Switzerland
基金
欧洲研究理事会;
关键词
Keyword spotting (KWS); machine learning (ML) hardware; speaker verification (SV); speech recognition; RECOGNITION;
D O I
10.1109/JSSC.2020.2968800
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The use of speech-triggered wake-up interfaces has grown significantly in the last few years for use in ubiquitous and mobile devices. Since these interfaces must always be active, power consumption is one of their primary design metrics. This article presents a complete mixed-signal system-on-chip, capable of directly interfacing to an analog microphone and performing keyword spotting (KWS) and speaker verification (SV), without any need for further external accesses. Through the use of: 1) an integrated single-chip digital-friendly design; b) hardware-aware algorithmic optimization; and c) memory- and power-optimized accelerators, ultra-low power is achieved while maintaining high accuracy for speech recognition tasks. The 65-nm implementation achieves 18.3- $\mu \text{W}$ worst case power consumption or 10.6- $\mu \text{W}$ power for typical real-time scenarios, $10\times $ below state of the art (SoA).
引用
收藏
页码:868 / 878
页数:11
相关论文
共 25 条
[1]  
[Anonymous], P IEEE TRENDS SPEECH
[2]  
[Anonymous], 2000, P 10 EUR SIGN PROC C
[3]  
[Anonymous], P C SYST MACH LEARN
[4]  
Badami K, 2018, SYMP VLSI CIRCUITS, P241, DOI 10.1109/VLSIC.2018.8502343
[5]   A 90 nm CMOS, 6 μW Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection [J].
Badami, Komail M. H. ;
Lauwereins, Steven ;
Meert, Wannes ;
Verhelst, Marian .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2016, 51 (01) :291-302
[6]  
Baljekar P, 2014, IEEE W SP LANG TECH, P536, DOI 10.1109/SLT.2014.7078631
[7]  
Bang S, 2017, ISSCC DIG TECH PAP I, P250, DOI 10.1109/ISSCC.2017.7870355
[8]   Sequence discriminative training for deep learning based acoustic keyword spotting [J].
Chen, Zhehuai ;
Qian, Yanmin ;
Yu, Kai .
SPEECH COMMUNICATION, 2018, 102 :100-111
[9]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[10]   SUPPORT VECTOR MACHINES AND JOINT FACTOR ANALYSIS FOR SPEAKER VERIFICATION [J].
Dehak, Najim ;
Kenny, Patrick ;
Dehak, Reda ;
Glembek, Ondrej ;
Dumouchel, Pierre ;
Burget, Lukas ;
Hubeika, Valiantsina ;
Castaldo, Fabio .
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, :4237-+