Computationally-efficient voice activity detection based on deep neural networks

被引:1
作者
Xiong, Yan [1 ]
Berisha, Visar [1 ]
Chakrabarti, Chaitali [1 ]
机构
[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA
来源
2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021) | 2021年
关键词
voice activity detection; deep neural network; capsule network; low-power architecture;
D O I
10.1109/SiPS52927.2021.00020
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Voice activity detection (VAD) is among the first preprocessing steps in most speech processing applications. While there are several very low-power analog solutions, the more recent deep neural network (DNN) based solutions have superior VAD performance in even complex noisy backgrounds at the expense of increase in computations. In this paper, we propose a computationally-efficient network architecture, ResCap+, for high performance VAD. ResCap+ operates on small-sized sequences and is built with residual blocks in a convolutional neural network to encode the characteristics of the input spectrum, and a capsule network with LSTM cells to capture the temporal relationship between these sequences. We evaluate the model using the AMI meeting corpus and show that it outperforms a state-of-the-art DNN-based model on accuracy with approximate to 55 x less computation cost. We also present initial hardware performance results on a low-power programmable architecture, Transmuter, and show that it can process every 40ms input audio sequence with a delay of 15.17ms resulting in real-time performance.
引用
收藏
页码:64 / 69
页数:6
相关论文
共 50 条
[21]   Research on Voice Activity Detection Methods Based on Deep Learning [J].
Bai, Ke ;
Yan, Huaicheng ;
Li, Hao ;
Tang, Nanxi ;
Sun, Jiazheng ;
Li, Zhichen .
2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, :1323-1328
[22]   AN EFFICIENT TRANSFORMER-BASED MODEL FOR VOICE ACTIVITY DETECTION [J].
Zhao, Yifei ;
Champagne, Benoit .
2022 IEEE 32ND INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2022,
[23]   Speech Activity Detection on YouTube Using Deep Neural Networks [J].
Ryant, Neville ;
Liberman, Mark ;
Yuan, Jiahong .
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, :728-731
[24]   Linear detector and neural networks in cascade for voice activity detection in hearing aids [J].
Garcia-Gomez, Joaquin ;
Gil-Pita, Roberto ;
Aguilar-Ortega, Miguel ;
Utrilla-Manso, Manuel ;
Rosa-Zurera, Manuel ;
Mohino-Herranz, Inma .
APPLIED ACOUSTICS, 2021, 175
[25]   Low Frequency Ultrasonic Voice Activity Detection using Convolutional Neural Networks [J].
McLoughlin, Ian ;
Song, Yan .
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, :2400-2404
[26]   Voice activity detection using neural network [J].
Ikedo, J .
IEICE TRANSACTIONS ON COMMUNICATIONS, 1998, E81B (12) :2509-2513
[27]   A Deep Neural Network Approach for Voice Activity Detection in Multi-Room Domestic Scenarios [J].
Ferroni, Giacomo ;
Bonfigli, Roberto ;
Principi, Emanuele ;
Squartini, Stefano ;
Piazza, Francesco .
2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
[28]   A BIN ENCODING TRAINING OF A SPIKING NEURAL NETWORK BASED VOICE ACTIVITY DETECTION [J].
Dellaferrera, Giorgia ;
Martinelli, Flavio ;
Cernak, Milos .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :3207-3211
[29]   Efficient Multiple Kernel Support Vector Machine Based Voice Activity Detection [J].
Wu, Ji ;
Zhang, Xiao-Lei .
IEEE SIGNAL PROCESSING LETTERS, 2011, 18 (08) :466-469
[30]   SPIKING NEURAL NETWORKS TRAINED WITH BACKPROPAGATION FOR LOW POWER NEUROMORPHIC IMPLEMENTATION OF VOICE ACTIVITY DETECTION [J].
Martinelli, Flavio ;
Dellaferrera, Giorgia ;
Mainar, Pablo ;
Cernak, Milos .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :8544-8548