Robust DNN-based VAD augmented with phone entropy based rejection of background speech

被引：3

作者：

Fujita, Yuya ^{[1
]}

Iso, Ken-ichi ^{[1
]}

机构：

[1] Yahoo Japan Corp, Tokyo, Japan

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

Voice Activity Detection; Deep Neural Network; Entropy; VOICE ACTIVITY DETECTION;

D O I：

10.21437/Interspeech.2016-136

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose a DNN-based voice activity detector augmented by entropy based frame rejection. DNN-based VAD classifies a frame into speech or non-speech and achieves significantly higher VAD performance compared to conventional statistical model-based VAD. We observed that many of the remaining errors are false alarms caused by background human speech, such as TV / radio or surrounding peoples' conversations. In order to reject such background speech frames, we introduce an entropy based confidence measure using the phone posterior probability output by a DNN-based acoustic model. Compared to the target speaker's voice background speech tends to have relatively unclear pronunciation or is contaminated by other types of noises so its entropy becomes larger than audio signals with only the target speaker's voice. Combining DNN-based VAD and the entropy criterion, we reject speech frames classified by the DNN-based VAD as having an entropy larger than a threshold value. We have evaluated the proposed approach and confirmed greater than 10% reduction in Sentence Error Rate.

引用

页码：3663 / 3667

页数：5

共 50 条

[31] Robust Audio Content Classification Using Hybrid-Based SMD and Entropy-Based VAD
Wang, Kun-Ching
ENTROPY, 2020, 22 (02)
[32] DNN-based Intelligent Beamforming on a Programmable Metasurface
Li S.
Fu S.
Xu F.
Journal of Radars, 2021, 10 (02) : 259 - 266
[33] DNN-BASED WIRELESS POSITIONING IN AN OUTDOOR ENVIRONMENT
Lee, Jin-Young
Eom, Chahyeon
Kwak, Youngsu
Kang, Hong-Goo
Lee, Chungyong
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3799 - 3803
[34] Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube
Yang, Shizhong
Liu, Yanli
Cao, Huidong
SYMMETRY-BASEL, 2023, 15 (10):
[35] DNN-BASED SPEECH PRESENCE PROBABILITY ESTIMATION FORMULTI-FRAME SINGLE-MICROPHONE SPEECH ENHANCEMENT
Tammen, Marvin
Fischer, Doerte
Meyer, Bernd T.
Doclo, Simon
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 191 - 195
[36] An Efficient Bispectrum Phase Entropy-based Algorithm for VAD
Gorriz, J. M.
Ramirez, J.
Puntonet, C. G.
Segura, J. C.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2322 - 2325
[37] EXPLOITING SPECTRO-TEMPORAL STRUCTURES USING NMF FOR DNN-BASED SUPERVISED SPEECH SEPARATION
Nie, Shuai
Liang, Shan
Li, Hao
Zhang, XueLiang
Yang, ZhanLei
Liu, WenJu
Dong, LiKe
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 469 - 473
[38] Efficient Hardware Implementation of DNN-Based Speech Enhancement Algorithm With Precise Sigmoid Activation Function
Chiluveru, Samba Raju
Gyanendra
Chunarkar, Snehit
Tripathy, Manoj
Kaushik, Brajesh Kumar
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (11) : 3461 - 3465
[39] Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning
Hu, Qiong
Wu, Zhizheng
Richmond, Korin
Yamagishi, Junichi
Stylianou, Yannis
Maia, Ranniery
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 854 - 858
[40] Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting
Panchapagesan, Sankaran
Sun, Ming
Khare, Aparna
Mandal, Spyros Matsoukas Arindam
Hoffineister, Bjorn
Vitaladevuni, Shiv
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 760 - 764

← 1 2 3 4 5 →