A 47-nW Voice Activity Detector (VAD) Featuring a Short-Time CNN Feature Extractor and an RNN-Based Classifier With a Non-Volatile CAP-ROM

被引:8
作者
Lin, Jinhai [1 ,2 ]
Un, Ka-Fai [1 ,2 ]
Yu, Wei-Han [1 ,2 ]
Martins, Rui P. [1 ,2 ,3 ]
Mak, Pui-In [1 ,2 ]
机构
[1] Univ Macau, Inst Microelect, State Key Lab Analog & Mixed Signal VLSI, Macau, Peoples R China
[2] Univ Macau, Fac Sci & Technol ECE, Macau, Peoples R China
[3] Univ Lisbon, Inst Super Tecn, P-1649004 Lisbon, Portugal
关键词
Capacitor-rom (CAP-ROM); edge computing; feature extraction; model reduction; non-volatile memory; receptive field; recurrent neural network (RNN); switched-capacitor circuits; voice activity detector (VAD); CHIP; CAPACITOR; SYSTEM; SOC;
D O I
10.1109/JSSC.2023.3302791
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This article reports an area-and-power-efficient voice activity detector (VAD) for voice-control edge devices. It innovates a short-time convolutional neural network (ST-CNN) and a recurrent neural network (RNN)-based classifier. Such a classifier shortens the extraction window of the ST-CNN while reducing its signal leakage, detection latency, and area and power budgets. The RNN also aids in parameter reduction of the VAD to only 45. We also propose the non-volatile capacitor-ROM (CAP-ROM) as the weight storage, eliminating the volatile memory and related memory access while freeing the VAD from the weight preloading procedure before activation. The non-reconfigurability of the CAP-ROM is acceptable since we verify that the VAD does not strongly depend on the dataset. Training with the Google speech command dataset (GSCD), our VAD in 65-nm CMOS exhibits a 94%/91% overall hit rate on the GSCD/TIMIT dataset with small power (47 nW) and area (0.022 mm(2)). There is no significant degradation of the hit rate for the supply voltage from 0.9 to 1.3 V or temperature from 0 to 60 degrees C, substantiating the robustness of the VAD.
引用
收藏
页码:3020 / 3029
页数:10
相关论文
共 26 条
[1]   A 90 nm CMOS, 6 μW Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection [J].
Badami, Komail M. H. ;
Lauwereins, Steven ;
Meert, Wannes ;
Verhelst, Marian .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2016, 51 (01) :291-302
[2]   AN INTEGRATED VOICE RECOGNITION SYSTEM [J].
BUI, NC ;
MONBARON, JJ ;
MICHEL, JG .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1983, 18 (01) :75-81
[3]  
Chen F., 2022, 2022 IEEE INT SOL ST, V65, P1
[4]  
Cho M, 2017, SYMP VLSI CIRCUITS, pC312
[5]   A 760-nW, 180-nm CMOS Fully Analog Voice Activity Detection System for Domestic Environment [J].
Croce, Marco ;
Friend, Brian ;
Nesta, Francesco ;
Crespi, Lorenzo ;
Malcovati, Piero ;
Baschirotto, Andrea .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2021, 56 (03) :778-787
[6]  
Fragniere E., 2005, 2005 IEEE International Solid-State Circuits Conference (IEEE Cat. No. 05CH37636), P140
[7]  
Garofolo J., TIMIT ACOUSTIC PHONE
[8]   Vocell: A 65-nm Speech-Triggered Wake-Up SoC for 10-μW Keyword Spotting and Speaker Verification [J].
Giraldo, Juan Sebastian P. ;
Lauwereins, Steven ;
Badami, Komail ;
Verhelst, Marian .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (04) :868-878
[9]  
Hubara Itay., 2016, P 30 INT C NEURAL IN, P4114, DOI DOI 10.5555/3157382.3157557
[10]   A SINGLE-CHIP 20-CHANNEL SPEECH SPECTRUM ANALYZER USING A MULTIPLEXED SWITCHED-CAPACITOR FILTER BANK [J].
KURAISHI, Y ;
NAKAYAMA, K ;
MIYADERA, K ;
OKAMURA, T .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1984, 19 (06) :964-970