EdgeRNN: A Compact Speech Recognition Network With Spatio-Temporal Features for Edge Computing

被引:42
作者
Yang, Shunzhi [1 ]
Gong, Zheng [1 ]
Ye, Kai [1 ]
Wei, Yungen [1 ]
Huang, Zhenhua [1 ]
Huang, Zheng [2 ]
机构
[1] South China Normal Univ, Sch Comp Sci, Guangzhou 510631, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Informat Secur Engn, Shanghai 200240, Peoples R China
关键词
RNN; speech emotion recognition; speech keywords recognition; edge computing; SYSTEM; LSTM;
D O I
10.1109/ACCESS.2020.2990974
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Driven by the vision of Internet of Things, some research efforts have already focused on designing a network of efficient speech recognition for the development of edge computing. Other researches (such as tpool2) do not make full use of spatial and temporal information in the acoustic features of speech. In this paper, we propose a compact speech recognition network with spatio-temporal features for edge computing, named EdgeRNN. Alternatively, EdgeRNN uses 1-Dimensional Convolutional Neural Network (1-D CNN) to process the overall spatial information of each frequency domain of the acoustic features. A Recurrent Neural Network (RNN) is used to process the temporal information of each frequency domain of the acoustic features. In addition, we propose a simplified attention mechanism to enhance the portion of the network that contributes to the final identification. The overall performance of EdgeRNN has been verified on speech emotion and keywords recognition. The IEMOCAP dataset is used in speech emotion recognition, and the unweighted average recall (UAR) reaches 63.98%. Speech keywords recognition uses Google's Speech Commands Datasets V1 with a weighted average recall (WAR) of 96.82%. Compared with the experimental results of the related efficient networks on Raspberry Pi 3B+, the accuracies of EdgeRNN have been improved on both of speech emotion and keywords recognition.
引用
收藏
页码:81468 / 81478
页数:11
相关论文
共 41 条
[1]   Emerging Edge Computing Technologies for Distributed IoT Systems [J].
Alnoman, Ali ;
Sharma, Shree Krishna ;
Ejaz, Waleed ;
Anpalagan, Alagan .
IEEE NETWORK, 2019, 33 (06) :140-147
[2]  
[Anonymous], 2019, ARXIV190403833
[3]  
[Anonymous], 2014, arXiv
[4]  
[Anonymous], ARXIV180403209
[5]  
Benelli G, 2018, IEEE INT CONF VLSI, P267, DOI 10.1109/VLSI-SoC.2018.8644728
[6]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[7]   A Dynamic Service Migration Mechanism in Edge Cognitive Computing [J].
Chen, Min ;
Li, Wei ;
Fortino, Giancarlo ;
Hao, Yixue ;
Hu, Long ;
Humar, Iztok .
ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2019, 19 (02)
[8]  
Cheng Y., 2017, A survey of model compression and acceleration for deep neural networks
[9]  
Cho Kyunghyun, 2014, P 2014 C EMP METH NA, P1724
[10]   Low-Latency Convolutional Recurrent Neural Network for Keyword Spotting [J].
Du, Hu ;
Li, Ruohan ;
Kim, Donggyun ;
Hirota, Kaoru ;
Dai, Yaping .
2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, :802-807