Robust DNN-based VAD augmented with phone entropy based rejection of background speech

被引:3
|
作者
Fujita, Yuya [1 ]
Iso, Ken-ichi [1 ]
机构
[1] Yahoo Japan Corp, Tokyo, Japan
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
Voice Activity Detection; Deep Neural Network; Entropy; VOICE ACTIVITY DETECTION;
D O I
10.21437/Interspeech.2016-136
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a DNN-based voice activity detector augmented by entropy based frame rejection. DNN-based VAD classifies a frame into speech or non-speech and achieves significantly higher VAD performance compared to conventional statistical model-based VAD. We observed that many of the remaining errors are false alarms caused by background human speech, such as TV / radio or surrounding peoples' conversations. In order to reject such background speech frames, we introduce an entropy based confidence measure using the phone posterior probability output by a DNN-based acoustic model. Compared to the target speaker's voice background speech tends to have relatively unclear pronunciation or is contaminated by other types of noises so its entropy becomes larger than audio signals with only the target speaker's voice. Combining DNN-based VAD and the entropy criterion, we reject speech frames classified by the DNN-based VAD as having an entropy larger than a threshold value. We have evaluated the proposed approach and confirmed greater than 10% reduction in Sentence Error Rate.
引用
收藏
页码:3663 / 3667
页数:5
相关论文
共 50 条
  • [1] Modeling Long Temporal Contexts for Robust DNN-based Speech Recognition
    Li, Bo
    Sim, Khe Chai
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 353 - 357
  • [2] DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 462 - 472
  • [3] DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation
    Houidhek, Amal
    Colotte, Vincent
    Mnasri, Zied
    Jouvet, Denis
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 9 - 20
  • [4] A study of speaker adaptation for DNN-based speech synthesis
    Wu, Zhizheng
    Swietojanski, Pawel
    Veaux, Christophe
    Renals, Steve
    King, Simon
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 879 - 883
  • [5] DNN-LSTM based VAD algorithm
    Zhang X.
    Niu P.
    Gao F.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2018, 58 (05): : 509 - 515
  • [6] DNN-Based Linear Prediction Residual Enhancement for Speech Dereverberation
    Feng, Xinyang
    Li, Nuo
    He, Zunwen
    Zhang, Yan
    Zhang, Wancheng
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 541 - 545
  • [7] Investigation of DNN-Based Audio-Visual Speech Recognition
    Tamura, Satoshi
    Ninomiya, Hiroshi
    Kitaoka, Norihide
    Osuga, Shin
    Iribe, Yurie
    Takeda, Kazuya
    Hayamizu, Satoru
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2444 - 2451
  • [8] DNN-Based Speech Enhancement via Integrating NMF and CASA
    Yan, Bofang
    Bao, Changchun
    Bai, Zhigang
    2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2018, : 435 - 439
  • [9] DNN-BASED AR-WIENER FILTERING FOR SPEECH ENHANCEMENT
    Yang, Yan
    Bao, Changchun
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2901 - 2905
  • [10] SYNTHETIC DATA FOR DNN-BASED DOA ESTIMATION OF INDOOR SPEECH
    Gelderblom, Femke B.
    Liu, Yi
    Kvam, Johannes
    Myrvoll, Tor Andre
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4390 - 4394