EdgeCRNN: an edge-computing oriented model of acoustic feature enhancement for keyword spotting

被引：8

作者：

Wei, Yungen ^{[1
,2
]}

Gong, Zheng ^{[1
]}

Yang, Shunzhi ^{[1
,2
,3
]}

Ye, Kai ^{[1
,2
,3
]}

Wen, Yamin ^{[3
]}

机构：

[1] South China Normal Univ, Sch Comp Sci, Guangzhou, Peoples R China

[2] GuangDong Polytech Sci & Technol, Comp Engn Tech Coll, Guangzhou, Peoples R China

[3] Guangdong Univ Finance & Econ, Sch Math & Stat, Guangzhou, Peoples R China

来源：

JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING | 2022年 / 13卷 / 03期

关键词：

Edge computing; Keyword spotting; Convolutional recurrent neural network; Feature enhancement; Lightweight structure;

D O I：

10.1007/s12652-021-03022-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Keyword Spotting (KWS) is a significant branch of Automatic Speech Recognition (ASR) and has been widely used in edge computing devices. The goal of KWS is to provide high accuracy with a low False Alarm Rate (FAR), while reducing the costs of memory, computation, and latency. However, limited resources are challenging for KWS applications on edge computing devices. Lightweight models and structures for deep learning have achieved good results in the KWS branch while maintaining efficient performances. In this paper, we present a new Convolutional Recurrent Neural Network (CRNN) architecture named EdgeCRNN for edge computing devices. EdgeCRNN, which is based on depthwise separable convolution and residual structure, uses a feature enhanced method. On the Google Speech Commands Dataset, the experimental results depict that EdgeCRNN can test 11.1 audio data per second on Raspberry Pi 3B+, which is 2.2 times than that of Tpool2. Compared with Tpool2, the accuracy of EdgeCRNN reaches 98.05% whilst its performance is also competitive.

引用

页码：1525 / 1535

页数：11

共 42 条

[1] Convolutional Neural Networks for Speech Recognition [J].

Abdel-Hamid, Ossama ;

Mohamed, Abdel-Rahman ;

Jiang, Hui ;

Deng, Li ;

Penn, Gerald ;

Yu, Dong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545

[2]

Anderson A, 2020, ARXIV PREPRINT ARXIV

[3]

[Anonymous], 2014, Rigid-motion scattering for image classification

[4] Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting [J].

Arik, Sercan O. ;

Kliegl, Markus ;

Child, Rewon ;

Hestness, Joel ;

Gibiansky, Andrew ;

Fougner, Chris ;

Prenger, Ryan ;

Coates, Adam .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :1606-1610

[5]

Benelli G, 2018, IEEE INT CONF VLSI, P267, DOI 10.1109/VLSI-SoC.2018.8644728

[6]

Chen GG, 2014, INT CONF ACOUST SPEE

[7]

Cho K., 2014, P 8 WORKSH SYNT SEM, P103

[8]

Coucke A., 2019, INT CONF ACOUST SPEE, P6351, DOI DOI 10.1109/icassp.2019.8683474

[9]

Dey R, 2017, MIDWEST SYMP CIRCUIT, P1597, DOI 10.1109/MWSCAS.2017.8053243

[10] An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick [J].

Dinelli, Gianmarco ;

Meoni, Gabriele ;

Rapuano, Emilio ;

Benelli, Gionata ;

Fanucci, Luca .

INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING, 2019, 2019

← 1 2 3 4 5 →