Computationally-efficient voice activity detection based on deep neural networks

被引：1

作者：

Xiong, Yan ^{[1
]}

Berisha, Visar ^{[1
]}

Chakrabarti, Chaitali ^{[1
]}

机构：

[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA

来源：

2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021) | 2021年

关键词：

voice activity detection; deep neural network; capsule network; low-power architecture;

D O I：

10.1109/SiPS52927.2021.00020

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Voice activity detection (VAD) is among the first preprocessing steps in most speech processing applications. While there are several very low-power analog solutions, the more recent deep neural network (DNN) based solutions have superior VAD performance in even complex noisy backgrounds at the expense of increase in computations. In this paper, we propose a computationally-efficient network architecture, ResCap+, for high performance VAD. ResCap+ operates on small-sized sequences and is built with residual blocks in a convolutional neural network to encode the characteristics of the input spectrum, and a capsule network with LSTM cells to capture the temporal relationship between these sequences. We evaluate the model using the AMI meeting corpus and show that it outperforms a state-of-the-art DNN-based model on accuracy with approximate to 55 x less computation cost. We also present initial hardware performance results on a low-power programmable architecture, Transmuter, and show that it can process every 40ms input audio sequence with a delay of 15.17ms resulting in real-time performance.

引用

页码：64 / 69

页数：6

共 50 条

[1] DENOISING DEEP NEURAL NETWORKS BASED VOICE ACTIVITY DETECTION
Zhang, Xiao-Lei
Wu, Ji
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 853 - 857
[2] Deep Neural Networks for Voice Activity Detection
Mihalache, Serban
Ivanov, Ioan-Alexandru
Burileanu, Dragos
2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 191 - 194
[3] A Comparison of Boosted Deep Neural Networks for Voice Activity Detection
Krishnakumar, Harshit
Williamson, Donald S.
2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
[4] Simultaneous Gender Classification and Voice Activity Detection Using Deep Neural Networks
Fujimura, Hiroshi
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1139 - 1143
[5] Deep Belief Networks Based Voice Activity Detection
Zhang, Xiao-Lei
Wu, Ji
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (04): : 697 - 710
[6] Voice Activity Detection based on Statistical Model Employing Deep Neural Network
Hwang, Inyoung
Chang, Joon-Hyuk
2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 582 - 585
[7] Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection
Zhang, Xiao-Lei
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (02) : 252 - 264
[8] Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection
Hwang, Inyoung
Park, Hyung-Min
Chang, Joon-Hyuk
COMPUTER SPEECH AND LANGUAGE, 2016, 38 : 1 - 12
[9] LINEAR-SCALE FILTERBANK FOR DEEP NEURAL NETWORK-BASED VOICE ACTIVITY DETECTION
Jung, Youngmoon
Kim, Younggwan
Lim, Hyungjun
Kim, Hoirin
2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), 2017, : 43 - 47
[10] EFFICIENT TARGET ACTIVITY DETECTION BASED ON RECURRENT NEURAL NETWORKS
Gerber, Daniel
Meier, Stefan
Kellermann, Walter
2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 46 - 50

← 1 2 3 4 5 →