Robust DNN-based VAD augmented with phone entropy based rejection of background speech

被引：3

作者：

Fujita, Yuya ^{[1
]}

Iso, Ken-ichi ^{[1
]}

机构：

[1] Yahoo Japan Corp, Tokyo, Japan

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

Voice Activity Detection; Deep Neural Network; Entropy; VOICE ACTIVITY DETECTION;

D O I：

10.21437/Interspeech.2016-136

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose a DNN-based voice activity detector augmented by entropy based frame rejection. DNN-based VAD classifies a frame into speech or non-speech and achieves significantly higher VAD performance compared to conventional statistical model-based VAD. We observed that many of the remaining errors are false alarms caused by background human speech, such as TV / radio or surrounding peoples' conversations. In order to reject such background speech frames, we introduce an entropy based confidence measure using the phone posterior probability output by a DNN-based acoustic model. Compared to the target speaker's voice background speech tends to have relatively unclear pronunciation or is contaminated by other types of noises so its entropy becomes larger than audio signals with only the target speaker's voice. Combining DNN-based VAD and the entropy criterion, we reject speech frames classified by the DNN-based VAD as having an entropy larger than a threshold value. We have evaluated the proposed approach and confirmed greater than 10% reduction in Sentence Error Rate.

引用

页码：3663 / 3667

页数：5

共 50 条

[41] DNN-Based Speech Enhancement Using Soft Audible Noise Masking for Wind Noise Reduction
Bai, Haichuan
Ge, Fengpei
Yan, Yonghong
CHINA COMMUNICATIONS, 2018, 15 (09) : 235 - 243
[42] DNN-Based Knee OA Severity Prediction System: Pathologically Robust Feature Engineering Approach
Ruikar D.
Kamble P.
Ruikar A.
Houde K.
Hegadi R.
SN Computer Science, 4 (1)
[43] DNN-Based Full-Band Speech Synthesis Using GMM Approximation of Spectral Envelope
Koguchi, Junya
Takamichi, Shinnosuke
Morise, Masanori
Saruwatari, Hiroshi
Sagayama, Shigeki
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (12) : 2673 - 2681
[44] DNN-Based Speech Enhancement Using Soft Audible Noise Masking for Wind Noise Reduction
Haichuan Bai
Fengpei Ge
Yonghong Yan
中国通信, 2018, 15 (09) : 235 - 243
[45] ON GENERATING MIXING NOISE SIGNALS WITH BASIS FUNCTIONS FOR SIMULATING NOISY SPEECH AND LEARNING DNN-BASED SPEECH ENHANCEMENT MODELS
Wen, Shi-Xue
Du, Jun
Lee, Chin-Hui
2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
[46] DNN-based anomaly prediction for the uncertainty in visual SLAM
Bosdelekidis, Vasileios
Johansen, Tor A.
Sokolova, Nadezda
2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 684 - 691
[47] Towards breaking DNN-based audio steganalysis with GAN
Wang, Jie
Wang, Rangding
Dong, Li
Yan, Diqun
Zhang, Xueyuan
Lin, Yuzhen
INTERNATIONAL JOURNAL OF AUTONOMOUS AND ADAPTIVE COMMUNICATIONS SYSTEMS, 2021, 14 (04) : 371 - 383
[48] DNN-based Models for Speaker Age and Gender Classification
Qawaqneh, Zakariya
Abu Mallouh, Arafat
Barkana, Buket D.
PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 4: BIOSIGNALS, 2017, : 106 - 111
[49] DNN-Based Duration Modeling for Synthesizing Short Sentences
Nagy, Peter
Nemeth, Geza
Speech and Computer, 2016, 9811 : 254 - 261
[50] On the Issue of Calibration in DNN-based Speaker Recognition Systems
McLaren, Mitchell
Castan, Diego
Ferrer, Luciana
Lawson, Aaron
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1825 - 1829

← 1 2 3 4 5 →