Speech Activity Detection on YouTube Using Deep Neural Networks

被引:0
|
作者
Ryant, Neville [1 ]
Liberman, Mark [1 ]
Yuan, Jiahong [1 ]
机构
[1] Linguist Data Consortium, Philadelphia, PA 19104 USA
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
基金
美国国家科学基金会;
关键词
speech activity detection; voice activity detection; segmentation; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech activity detection (SAD) is an important first step in speech processing. Commonly used methods (e.g., frame-level classification using gaussian mixture models (GMMs)) work well under stationary noise conditions, but do not generalize well to domains such as YouTube, where videos may exhibit a diverse range of environmental conditions. One solution is to augment the conventional cepstral features with additional, hand-engineered features (e.g., spectral flux, spectral centroid, multiband spectral entropies) which are robust to changes in environment and recording condition. An alternative approach, explored here, is to learn robust features during the course of training using an appropriate architecture such as deep neural networks (DNNs). In this paper we demonstrate that a DNN with input consisting of multiple frames of mel frequency cepstral coefficients (MFCCs) yields drastically lower frame-wise error rates (19.6%) on YouTube videos compared to a conventional GMM based system (40%).
引用
收藏
页码:728 / 731
页数:4
相关论文
共 50 条
  • [41] Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
    Li, Kun
    Qian, Xiaojun
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 193 - 207
  • [42] Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks
    Ogawa, Atsunori
    Hori, Takaaki
    SPEECH COMMUNICATION, 2017, 89 : 70 - 83
  • [43] Corrective Focus Detection in Italian Speech Using Neural Networks
    Lopez-Zorrilla, Asier
    deVelasco-Vazquez, Mikel
    Cenceschi, Sonia
    Ines Torres, M.
    ACTA POLYTECHNICA HUNGARICA, 2018, 15 (05) : 109 - 127
  • [44] Detection of phonological features in continuous speech using neural networks
    King, S
    Taylor, P
    COMPUTER SPEECH AND LANGUAGE, 2000, 14 (04): : 333 - 353
  • [45] Cell mitosis detection using deep neural networks
    Zhou, Yao
    Mao, Hua
    Yi, Zhang
    KNOWLEDGE-BASED SYSTEMS, 2017, 137 : 19 - 28
  • [46] Power Theft Detection Using Deep Neural Networks
    Mangat, Gagandeep
    Divya, Divya
    Gupta, Varun
    Sambyal, Nitigya
    ELECTRIC POWER COMPONENTS AND SYSTEMS, 2021, 49 (4-5) : 458 - 473
  • [47] Object Detection Using Deep Convolutional Neural Networks
    Qian, Huimin
    Xu, Jiawei
    Zhou, Jun
    2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1151 - 1156
  • [48] Video Dynamics Detection Using Deep Neural Networks
    Zheng, Keji
    Yan, Wei Qi
    Nand, Parma
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2018, 2 (03): : 224 - 234
  • [49] Scalable Object Detection using Deep Neural Networks
    Erhan, Dumitru
    Szegedy, Christian
    Toshev, Alexander
    Anguelov, Dragomir
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2155 - 2162
  • [50] Voice activity detection based on deep neural networks and Viterbi
    Bai, Liang
    Zhang, Zhen
    Hu, Jun
    2017 2ND INTERNATIONAL SEMINAR ON ADVANCES IN MATERIALS SCIENCE AND ENGINEERING, 2017, 231