Robust DNN-based VAD augmented with phone entropy based rejection of background speech

被引：3

作者：

Fujita, Yuya ^{[1
]}

Iso, Ken-ichi ^{[1
]}

机构：

[1] Yahoo Japan Corp, Tokyo, Japan

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

Voice Activity Detection; Deep Neural Network; Entropy; VOICE ACTIVITY DETECTION;

D O I：

10.21437/Interspeech.2016-136

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose a DNN-based voice activity detector augmented by entropy based frame rejection. DNN-based VAD classifies a frame into speech or non-speech and achieves significantly higher VAD performance compared to conventional statistical model-based VAD. We observed that many of the remaining errors are false alarms caused by background human speech, such as TV / radio or surrounding peoples' conversations. In order to reject such background speech frames, we introduce an entropy based confidence measure using the phone posterior probability output by a DNN-based acoustic model. Compared to the target speaker's voice background speech tends to have relatively unclear pronunciation or is contaminated by other types of noises so its entropy becomes larger than audio signals with only the target speaker's voice. Combining DNN-based VAD and the entropy criterion, we reject speech frames classified by the DNN-based VAD as having an entropy larger than a threshold value. We have evaluated the proposed approach and confirmed greater than 10% reduction in Sentence Error Rate.

引用

页码：3663 / 3667

页数：5

共 50 条

[1] Modeling Long Temporal Contexts for Robust DNN-based Speech Recognition
Li, Bo
Sim, Khe Chai
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 353 - 357
[2] DNN-Based Speech Synthesis Using Speaker Codes
Hojo, Nobukatsu
Ijima, Yusuke
Mizuno, Hideyuki
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 462 - 472
[3] DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation
Houidhek, Amal
Colotte, Vincent
Mnasri, Zied
Jouvet, Denis
STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 9 - 20
[4] A study of speaker adaptation for DNN-based speech synthesis
Wu, Zhizheng
Swietojanski, Pawel
Veaux, Christophe
Renals, Steve
King, Simon
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 879 - 883
[5] DNN-LSTM based VAD algorithm
Zhang X.
Niu P.
Gao F.
Qinghua Daxue Xuebao/Journal of Tsinghua University, 2018, 58 (05): : 509 - 515
[6] DNN-Based Linear Prediction Residual Enhancement for Speech Dereverberation
Feng, Xinyang
Li, Nuo
He, Zunwen
Zhang, Yan
Zhang, Wancheng
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 541 - 545
[7] Investigation of DNN-Based Audio-Visual Speech Recognition
Tamura, Satoshi
Ninomiya, Hiroshi
Kitaoka, Norihide
Osuga, Shin
Iribe, Yurie
Takeda, Kazuya
Hayamizu, Satoru
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2444 - 2451
[8] DNN-Based Speech Enhancement via Integrating NMF and CASA
Yan, Bofang
Bao, Changchun
Bai, Zhigang
2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2018, : 435 - 439
[9] DNN-BASED AR-WIENER FILTERING FOR SPEECH ENHANCEMENT
Yang, Yan
Bao, Changchun
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2901 - 2905
[10] SYNTHETIC DATA FOR DNN-BASED DOA ESTIMATION OF INDOOR SPEECH
Gelderblom, Femke B.
Liu, Yi
Kvam, Johannes
Myrvoll, Tor Andre
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4390 - 4394

← 1 2 3 4 5 →