A 47-nW Voice Activity Detector (VAD) Featuring a Short-Time CNN Feature Extractor and an RNN-Based Classifier With a Non-Volatile CAP-ROM

被引：8

作者：

Lin, Jinhai ^{[1
,2
]}

Un, Ka-Fai ^{[1
,2
]}

Yu, Wei-Han ^{[1
,2
]}

Martins, Rui P. ^{[1
,2
,3
]}

Mak, Pui-In ^{[1
,2
]}

机构：

[1] Univ Macau, Inst Microelect, State Key Lab Analog & Mixed Signal VLSI, Macau, Peoples R China

[2] Univ Macau, Fac Sci & Technol ECE, Macau, Peoples R China

[3] Univ Lisbon, Inst Super Tecn, P-1649004 Lisbon, Portugal

来源：

IEEE JOURNAL OF SOLID-STATE CIRCUITS | 2023年 / 58卷 / 11期

关键词：

Capacitor-rom (CAP-ROM); edge computing; feature extraction; model reduction; non-volatile memory; receptive field; recurrent neural network (RNN); switched-capacitor circuits; voice activity detector (VAD); CHIP; CAPACITOR; SYSTEM; SOC;

D O I：

10.1109/JSSC.2023.3302791

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This article reports an area-and-power-efficient voice activity detector (VAD) for voice-control edge devices. It innovates a short-time convolutional neural network (ST-CNN) and a recurrent neural network (RNN)-based classifier. Such a classifier shortens the extraction window of the ST-CNN while reducing its signal leakage, detection latency, and area and power budgets. The RNN also aids in parameter reduction of the VAD to only 45. We also propose the non-volatile capacitor-ROM (CAP-ROM) as the weight storage, eliminating the volatile memory and related memory access while freeing the VAD from the weight preloading procedure before activation. The non-reconfigurability of the CAP-ROM is acceptable since we verify that the VAD does not strongly depend on the dataset. Training with the Google speech command dataset (GSCD), our VAD in 65-nm CMOS exhibits a 94%/91% overall hit rate on the GSCD/TIMIT dataset with small power (47 nW) and area (0.022 mm(2)). There is no significant degradation of the hit rate for the supply voltage from 0.9 to 1.3 V or temperature from 0 to 60 degrees C, substantiating the robustness of the VAD.

引用

页码：3020 / 3029

页数：10

共 26 条

[1] A 90 nm CMOS, 6 μW Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection [J].