Robust DNN-based VAD augmented with phone entropy based rejection of background speech

被引:3
|
作者
Fujita, Yuya [1 ]
Iso, Ken-ichi [1 ]
机构
[1] Yahoo Japan Corp, Tokyo, Japan
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
Voice Activity Detection; Deep Neural Network; Entropy; VOICE ACTIVITY DETECTION;
D O I
10.21437/Interspeech.2016-136
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a DNN-based voice activity detector augmented by entropy based frame rejection. DNN-based VAD classifies a frame into speech or non-speech and achieves significantly higher VAD performance compared to conventional statistical model-based VAD. We observed that many of the remaining errors are false alarms caused by background human speech, such as TV / radio or surrounding peoples' conversations. In order to reject such background speech frames, we introduce an entropy based confidence measure using the phone posterior probability output by a DNN-based acoustic model. Compared to the target speaker's voice background speech tends to have relatively unclear pronunciation or is contaminated by other types of noises so its entropy becomes larger than audio signals with only the target speaker's voice. Combining DNN-based VAD and the entropy criterion, we reject speech frames classified by the DNN-based VAD as having an entropy larger than a threshold value. We have evaluated the proposed approach and confirmed greater than 10% reduction in Sentence Error Rate.
引用
收藏
页码:3663 / 3667
页数:5
相关论文
共 50 条
  • [41] DNN-Based Speech Enhancement Using Soft Audible Noise Masking for Wind Noise Reduction
    Bai, Haichuan
    Ge, Fengpei
    Yan, Yonghong
    CHINA COMMUNICATIONS, 2018, 15 (09) : 235 - 243
  • [42] DNN-Based Knee OA Severity Prediction System: Pathologically Robust Feature Engineering Approach
    Ruikar D.
    Kamble P.
    Ruikar A.
    Houde K.
    Hegadi R.
    SN Computer Science, 4 (1)
  • [43] DNN-Based Full-Band Speech Synthesis Using GMM Approximation of Spectral Envelope
    Koguchi, Junya
    Takamichi, Shinnosuke
    Morise, Masanori
    Saruwatari, Hiroshi
    Sagayama, Shigeki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (12) : 2673 - 2681
  • [44] DNN-Based Speech Enhancement Using Soft Audible Noise Masking for Wind Noise Reduction
    Haichuan Bai
    Fengpei Ge
    Yonghong Yan
    中国通信, 2018, 15 (09) : 235 - 243
  • [45] ON GENERATING MIXING NOISE SIGNALS WITH BASIS FUNCTIONS FOR SIMULATING NOISY SPEECH AND LEARNING DNN-BASED SPEECH ENHANCEMENT MODELS
    Wen, Shi-Xue
    Du, Jun
    Lee, Chin-Hui
    2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [46] DNN-based anomaly prediction for the uncertainty in visual SLAM
    Bosdelekidis, Vasileios
    Johansen, Tor A.
    Sokolova, Nadezda
    2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 684 - 691
  • [47] Towards breaking DNN-based audio steganalysis with GAN
    Wang, Jie
    Wang, Rangding
    Dong, Li
    Yan, Diqun
    Zhang, Xueyuan
    Lin, Yuzhen
    INTERNATIONAL JOURNAL OF AUTONOMOUS AND ADAPTIVE COMMUNICATIONS SYSTEMS, 2021, 14 (04) : 371 - 383
  • [48] DNN-based Models for Speaker Age and Gender Classification
    Qawaqneh, Zakariya
    Abu Mallouh, Arafat
    Barkana, Buket D.
    PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 4: BIOSIGNALS, 2017, : 106 - 111
  • [49] DNN-Based Duration Modeling for Synthesizing Short Sentences
    Nagy, Peter
    Nemeth, Geza
    Speech and Computer, 2016, 9811 : 254 - 261
  • [50] On the Issue of Calibration in DNN-based Speaker Recognition Systems
    McLaren, Mitchell
    Castan, Diego
    Ferrer, Luciana
    Lawson, Aaron
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1825 - 1829