Robust DNN-based VAD augmented with phone entropy based rejection of background speech

被引:3
|
作者
Fujita, Yuya [1 ]
Iso, Ken-ichi [1 ]
机构
[1] Yahoo Japan Corp, Tokyo, Japan
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
Voice Activity Detection; Deep Neural Network; Entropy; VOICE ACTIVITY DETECTION;
D O I
10.21437/Interspeech.2016-136
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a DNN-based voice activity detector augmented by entropy based frame rejection. DNN-based VAD classifies a frame into speech or non-speech and achieves significantly higher VAD performance compared to conventional statistical model-based VAD. We observed that many of the remaining errors are false alarms caused by background human speech, such as TV / radio or surrounding peoples' conversations. In order to reject such background speech frames, we introduce an entropy based confidence measure using the phone posterior probability output by a DNN-based acoustic model. Compared to the target speaker's voice background speech tends to have relatively unclear pronunciation or is contaminated by other types of noises so its entropy becomes larger than audio signals with only the target speaker's voice. Combining DNN-based VAD and the entropy criterion, we reject speech frames classified by the DNN-based VAD as having an entropy larger than a threshold value. We have evaluated the proposed approach and confirmed greater than 10% reduction in Sentence Error Rate.
引用
收藏
页码:3663 / 3667
页数:5
相关论文
共 50 条
  • [31] Robust Audio Content Classification Using Hybrid-Based SMD and Entropy-Based VAD
    Wang, Kun-Ching
    ENTROPY, 2020, 22 (02)
  • [32] DNN-based Intelligent Beamforming on a Programmable Metasurface
    Li S.
    Fu S.
    Xu F.
    Journal of Radars, 2021, 10 (02) : 259 - 266
  • [33] DNN-BASED WIRELESS POSITIONING IN AN OUTDOOR ENVIRONMENT
    Lee, Jin-Young
    Eom, Chahyeon
    Kwak, Youngsu
    Kang, Hong-Goo
    Lee, Chungyong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3799 - 3803
  • [34] Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube
    Yang, Shizhong
    Liu, Yanli
    Cao, Huidong
    SYMMETRY-BASEL, 2023, 15 (10):
  • [35] DNN-BASED SPEECH PRESENCE PROBABILITY ESTIMATION FORMULTI-FRAME SINGLE-MICROPHONE SPEECH ENHANCEMENT
    Tammen, Marvin
    Fischer, Doerte
    Meyer, Bernd T.
    Doclo, Simon
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 191 - 195
  • [36] An Efficient Bispectrum Phase Entropy-based Algorithm for VAD
    Gorriz, J. M.
    Ramirez, J.
    Puntonet, C. G.
    Segura, J. C.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2322 - 2325
  • [37] EXPLOITING SPECTRO-TEMPORAL STRUCTURES USING NMF FOR DNN-BASED SUPERVISED SPEECH SEPARATION
    Nie, Shuai
    Liang, Shan
    Li, Hao
    Zhang, XueLiang
    Yang, ZhanLei
    Liu, WenJu
    Dong, LiKe
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 469 - 473
  • [38] Efficient Hardware Implementation of DNN-Based Speech Enhancement Algorithm With Precise Sigmoid Activation Function
    Chiluveru, Samba Raju
    Gyanendra
    Chunarkar, Snehit
    Tripathy, Manoj
    Kaushik, Brajesh Kumar
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (11) : 3461 - 3465
  • [39] Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning
    Hu, Qiong
    Wu, Zhizheng
    Richmond, Korin
    Yamagishi, Junichi
    Stylianou, Yannis
    Maia, Ranniery
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 854 - 858
  • [40] Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting
    Panchapagesan, Sankaran
    Sun, Ming
    Khare, Aparna
    Mandal, Spyros Matsoukas Arindam
    Hoffineister, Bjorn
    Vitaladevuni, Shiv
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 760 - 764